# Pandas

- It is a high level data manipulation package built on numpy package (creator: Wes McKinney)
- It can contain any data type which circumvents the disadvantage of 2D numpy arrays that can only contain one data type.
- Built on the concept of DataFrame

## DataFrame

You can build it manually by using Dictionaries.

- keys are column labels
- values are data, column by column

In [4]:
dict = {
 "country":["Brazil", "Russia", "India", "China", "South Africa"],
 "capital":["Brasilia", "Moscow", "New Delhi", "Beijing", "Pretoria"],
 "area":[8.516, 17.10, 3.286, 9.597, 1.221],
 "population":[200.4, 143.5, 1252, 1357, 52.98] }

dict

{'area': [8.516, 17.1, 3.286, 9.597, 1.221],
 'capital': ['Brasilia', 'Moscow', 'New Delhi', 'Beijing', 'Pretoria'],
 'country': ['Brazil', 'Russia', 'India', 'China', 'South Africa'],
 'population': [200.4, 143.5, 1252, 1357, 52.98]}

In [5]:
import pandas as pd

brics = pd.DataFrame(dict)
brics

Unnamed: 0,area,capital,country,population
0,8.516,Brasilia,Brazil,200.4
1,17.1,Moscow,Russia,143.5
2,3.286,New Delhi,India,1252.0
3,9.597,Beijing,China,1357.0
4,1.221,Pretoria,South Africa,52.98


In [6]:
# You can change the index by applying new values to the index method.

brics.index = ["BR", "RU", "IN", "CH", "SA"]
brics

Unnamed: 0,area,capital,country,population
BR,8.516,Brasilia,Brazil,200.4
RU,17.1,Moscow,Russia,143.5
IN,3.286,New Delhi,India,1252.0
CH,9.597,Beijing,China,1357.0
SA,1.221,Pretoria,South Africa,52.98


### DataFrame from CSV file

In [11]:
brics = pd.read_csv("datasets/brics.csv") # the dataframe would still have 0-5 as index

brics = pd.read_csv("datasets/brics.csv", index_col = 0) # now the actual first column will be the index

brics

Unnamed: 0,country,capital,area,population
BR,Brazil,Brasilia,8.516,200.4
RU,Russia,Moscow,17.1,143.5
IN,India,New Delhi,3.286,1252.0
CH,China,Beijing,9.597,1357.0
SA,South Africa,Pretoria,1.221,52.98


## Index and Select Data
- Square brackets
- Advanced methods
    - loc
    - iloc

### Column Access [ ]

In [14]:
#1 Series: Column Access []
brics["country"] 

BR          Brazil
RU          Russia
IN           India
CH           China
SA    South Africa
Name: country, dtype: object

In [13]:
type(brics["country"]) # result is a Panda Series object: one dimensional array that can be labelled - 1D labelled array

pandas.core.series.Series

In [15]:
#2 Data Frame: Column Access [[]] - if we need the result in Data Frame, use double brackets
brics[["country"]]

Unnamed: 0,country
BR,Brazil
RU,Russia
IN,India
CH,China
SA,South Africa


In [16]:
type(brics[["country"]])

pandas.core.frame.DataFrame

In [18]:
brics[["country"]].shape

(5, 1)

In [20]:
# What we are doing is placing a list object inside brackets
brics[["country","capital"]]

Unnamed: 0,country,capital
BR,Brazil,Brasilia
RU,Russia,Moscow
IN,India,New Delhi
CH,China,Beijing
SA,South Africa,Pretoria


### Row Access [ ]

We use Slicing to get the rows

In [21]:
brics[1:4]  # Rows 2 thru 4; end of the slice is not included and index starts at 0

Unnamed: 0,country,capital,area,population
RU,Russia,Moscow,17.1,143.5
IN,India,New Delhi,3.286,1252.0
CH,China,Beijing,9.597,1357.0


- Square brackets: limited functionality
- Ideally
    - 2D Numpy arrays
    - my_array[rows,columns]
- Pandas
    - loc (label-based)
    - iloc (integer position-based)

#### Row Access with loc

In [27]:
brics.loc["RU"] # index label

country       Russia
capital       Moscow
area            17.1
population     143.5
Name: RU, dtype: object

In [23]:
type(brics.loc["RU"])

pandas.core.series.Series

In [24]:
brics.loc[["RU"]] # For Data Frame, use double brackets

Unnamed: 0,country,capital,area,population
RU,Russia,Moscow,17.1,143.5


In [25]:
brics.loc[["RU", "IN", "CH"]]  # multiple rows using the index to result in a Data Frame

Unnamed: 0,country,capital,area,population
RU,Russia,Moscow,17.1,143.5
IN,India,New Delhi,3.286,1252.0
CH,China,Beijing,9.597,1357.0


In [29]:
# Row & Column loc

# We are passing a list of index values for ROWS and a list of column labels for COLUMNS
brics.loc[["RU", "IN", "CH"], ["country", "capital"]] 

Unnamed: 0,country,capital
RU,Russia,Moscow
IN,India,New Delhi
CH,China,Beijing


In [30]:
brics.loc[:, ["country", "capital"]] # all rows with : slicing

Unnamed: 0,country,capital
BR,Brazil,Brasilia
RU,Russia,Moscow
IN,India,New Delhi
CH,China,Beijing
SA,South Africa,Pretoria


#### Recap
- Square brackets
    - Column access - brics[["country", "capital"]]
    - Row access: only through slicing - brics[1:4]
- loc (label-based)
    - Row access - brics.loc[["RU", "IN", "CH"]]
    - Column access - brics.loc[ :, ["country", "capital"]]
    - Row & Column access - brics.loc[["RU", "IN", "CH"], ["country", "capital"]]

#### Row Access with iloc

In [31]:
brics.iloc[[1]] # same as brics.loc[["RU"]] - use index for the row instead of label

Unnamed: 0,country,capital,area,population
RU,Russia,Moscow,17.1,143.5


In [32]:
brics.iloc[[1,2,3]] # same as brics.loc[["RU", "IN", "CH"]] - use index for the rows instead of labels

Unnamed: 0,country,capital,area,population
RU,Russia,Moscow,17.1,143.5
IN,India,New Delhi,3.286,1252.0
CH,China,Beijing,9.597,1357.0


#### Row & Column iloc

In [33]:
# same as brics.loc[["RU", "IN", "CH"], ["country", "capital"]] - use index list for the rows and columns instead of labels
brics.iloc[[1,2,3], [0, 1]] 

Unnamed: 0,country,capital
RU,Russia,Moscow
IN,India,New Delhi
CH,China,Beijing


In [34]:
# same as brics.loc[:, ["country", "capital"]] - use index list slicing for the rows and columns instead of labels

brics.iloc[:, [0,1]] 

Unnamed: 0,country,capital
BR,Brazil,Brasilia
RU,Russia,Moscow
IN,India,New Delhi
CH,China,Beijing
SA,South Africa,Pretoria
