# Pandas

## Series - 1D frames

Accepts `lists` as param and returns the list with indices.

In [1]:
import pandas as pd

In [9]:
names = pd.Series(['Jane', 'Tracy', 'John', 'Liz', 'Hannan', 'Hassan', 'Joyce', 'Peter', 'Ann'])

In [11]:
ages = pd.Series([22, 19, 34, 18, 22, 22, 25, 30, 29])
ages

0    22
1    19
2    34
3    18
4    22
5    22
6    25
7    30
8    29
dtype: int64

In [13]:
ratings = pd.Series([2, 3, 1, 4, 5, 4, 4, 3, 3])

In [14]:
fav_food = pd.Series(['Apples', 'Bananas', 'Salad', 'Chicken', 'Shawarma', 'Berries', 'Pizza', 'Coke', 'Fruit juice'])

## DataFrames - 2d representation

These accept `dictionaries` as input and returns a 2d structured - table like data.

In [25]:
people = pd.DataFrame({'Name': names, 'Age': ages, 'Rating': ratings, 'Favourite Food': fav_food})
people

Unnamed: 0,Name,Age,Rating,Favourite Food
0,Jane,22,2,Apples
1,Tracy,19,3,Bananas
2,John,34,1,Salad
3,Liz,18,4,Chicken
4,Hannan,22,5,Shawarma
5,Hassan,22,4,Berries
6,Joyce,25,4,Pizza
7,Peter,30,3,Coke
8,Ann,29,3,Fruit juice


## Exporting Data

We have `to_csv` & `to_excell`.

In [26]:
people.to_csv('data/people_data.csv', index=False)

# This requires 'openpyxl'
# people_data.to_excel('data/people_data.xlsx', index=False)

## Reading Data

From `.csv`

In [23]:
read_data = pd.read_csv('data/people_data.csv')

In [24]:
read_data

Unnamed: 0,Name,Age,Rating,Favourite Food
0,Jane,22,2,Apples
1,Tracy,19,3,Bananas
2,John,34,1,Salad
3,Liz,18,4,Chicken
4,Hannan,22,5,Shawarma
5,Hassan,22,4,Berries
6,Joyce,25,4,Pizza
7,Peter,30,3,Coke
8,Ann,29,3,Fruit juice


## Describing Data

In [28]:
people.dtypes

Name              object
Age                int64
Rating             int64
Favourite Food    object
dtype: object

In [29]:
people.columns

Index(['Name', 'Age', 'Rating', 'Favourite Food'], dtype='object')

In [30]:
people.index

RangeIndex(start=0, stop=9, step=1)

### Note
Describe() only works on numeric columns.

In [32]:
people.describe()

Unnamed: 0,Age,Rating
count,9.0,9.0
mean,24.555556,3.222222
std,5.387743,1.20185
min,18.0,1.0
25%,22.0,3.0
50%,22.0,3.0
75%,29.0,4.0
max,34.0,5.0


In [33]:
people.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9 entries, 0 to 8
Data columns (total 4 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Name            9 non-null      object
 1   Age             9 non-null      int64 
 2   Rating          9 non-null      int64 
 3   Favourite Food  9 non-null      object
dtypes: int64(2), object(2)
memory usage: 416.0+ bytes


In [34]:
people['Rating'].mean()

3.2222222222222223

In [35]:
len(people)

9

## Viewing and Selecting Data

In [36]:
people.head()

Unnamed: 0,Name,Age,Rating,Favourite Food
0,Jane,22,2,Apples
1,Tracy,19,3,Bananas
2,John,34,1,Salad
3,Liz,18,4,Chicken
4,Hannan,22,5,Shawarma


### Get Row By Index

Use `loc[index]`. This gets the row at that position.

In [38]:
people.loc[4]

Name                Hannan
Age                     22
Rating                   5
Favourite Food    Shawarma
Name: 4, dtype: object

You can also use `iloc[index]` to get the row at exactly that index. Even if the indices are not in order or some are skipped.

In [39]:
people.iloc[3]

Name                  Liz
Age                    18
Rating                  4
Favourite Food    Chicken
Name: 3, dtype: object

You can use slicing with the 2 attributes

In [46]:
people.loc[:3]

Unnamed: 0,Name,Age,Rating,Favourite Food
0,Jane,22,2,Apples
1,Tracy,19,3,Bananas
2,John,34,1,Salad
3,Liz,18,4,Chicken


In [45]:
people.iloc[:3]

Unnamed: 0,Name,Age,Rating,Favourite Food
0,Jane,22,2,Apples
1,Tracy,19,3,Bananas
2,John,34,1,Salad


#### Boolean Indexing

This is passing a condition to the index location of reading a df

In [47]:
people[people.Rating > 3]

Unnamed: 0,Name,Age,Rating,Favourite Food
3,Liz,18,4,Chicken
4,Hannan,22,5,Shawarma
5,Hassan,22,4,Berries
6,Joyce,25,4,Pizza
