# Pandas Basics

* a high-level `data manipulation tool`
    * built on the `Numpy` package
    * its key data structure is called the `DataFrame`
* `DataFrames` allow you to store and manipulate `tabular data` in rows of observations and columns of variables.

## Create a DataFrame

### through a dictionary

In [1]:
dict = {"country": ["Brazil", "Russia", "India", "China", "South Africa"],
       "capital": ["Brasilia", "Moscow", "New Dehli", "Beijing", "Pretoria"],
       "area": [8.516, 17.10, 3.286, 9.597, 1.221],
       "population": [200.4, 143.5, 1252, 1357, 52.98] }

import pandas as pd
brics = pd.DataFrame(dict)
print(brics)

        country    capital    area  population
0        Brazil   Brasilia   8.516      200.40
1        Russia     Moscow  17.100      143.50
2         India  New Dehli   3.286     1252.00
3         China    Beijing   9.597     1357.00
4  South Africa   Pretoria   1.221       52.98


In [2]:
brics.index = ["BR", "RU", "IN", "CH", "SA"]
print(brics)

         country    capital    area  population
BR        Brazil   Brasilia   8.516      200.40
RU        Russia     Moscow  17.100      143.50
IN         India  New Dehli   3.286     1252.00
CH         China    Beijing   9.597     1357.00
SA  South Africa   Pretoria   1.221       52.98


### through a csv file

In [3]:
cars = pd.read_csv("cars.csv")
print(cars)

  Unnamed: 0  cars_per_cap       country  drives_right
0         US           809  UnitedStates          True
1        AUS           731     Australia         False
2        JAP           588         Japan         False
3         IN            18         India         False
4         RU           200        Russia          True
5        MOR            70       Morocco          True
6         EG            45         Egypt          True


## Indexing DataFrames

In [5]:
cars = pd.read_csv("cars.csv", index_col = 0)
print(cars)

     cars_per_cap       country  drives_right
US            809  UnitedStates          True
AUS           731     Australia         False
JAP           588         Japan         False
IN             18         India         False
RU            200        Russia          True
MOR            70       Morocco          True
EG             45         Egypt          True


### Using square brackets

#### Print column as Pandas Series

In [6]:
print(cars["cars_per_cap"])

US     809
AUS    731
JAP    588
IN      18
RU     200
MOR     70
EG      45
Name: cars_per_cap, dtype: int64


#### Print column as Pandas DataFrame

In [7]:
print (cars[["cars_per_cap"]])

     cars_per_cap
US            809
AUS           731
JAP           588
IN             18
RU            200
MOR            70
EG             45


In [8]:
print (cars[["cars_per_cap", "country"]])

     cars_per_cap       country
US            809  UnitedStates
AUS           731     Australia
JAP           588         Japan
IN             18         India
RU            200        Russia
MOR            70       Morocco
EG             45         Egypt


## Access rows

### Using Square brackets

In [11]:
print(cars[0:4])

     cars_per_cap       country  drives_right
US            809  UnitedStates          True
AUS           731     Australia         False
JAP           588         Japan         False
IN             18         India         False


In [12]:
print(cars[4:6])

     cars_per_cap  country  drives_right
RU            200   Russia          True
MOR            70  Morocco          True


### Via loc and iloc

In [13]:
print(cars.iloc[2])

cars_per_cap      588
country         Japan
drives_right    False
Name: JAP, dtype: object


In [14]:
print(cars.loc[["AUS", "EG"]])

     cars_per_cap    country  drives_right
AUS           731  Australia         False
EG             45      Egypt          True
