# Pandas - Anees Ahmad - Review - 2021/01/09

## Installing Pandas
if you have installed anaconda, pandas is already part of it. for other python installation use `pip install pandas` to install latest version**

## Section 5.1 Pandas Series
![series](series1.png)

In [1]:
# must need to import pandas
import pandas as pd
obj = pd.Series([4, 7, -5, 3])
print(obj)

0    4
1    7
2   -5
3    3
dtype: int64


### pandas Series has two things in one object
1. Values
2. Index

In [2]:
sales = pd.Series([100, 200, 100, 400])
print(sales.values)
print(sales.index)

[100 200 100 400]
RangeIndex(start=0, stop=4, step=1)


In [3]:
sales = pd.Series([100, 200, 100, 400], index = ['Jan', 'Feb', 'Mar','Apr'])
print(sales)
print(sales.values)
print(sales.index)

Jan    100
Feb    200
Mar    100
Apr    400
dtype: int64
[100 200 100 400]
Index(['Jan', 'Feb', 'Mar', 'Apr'], dtype='object')


In [4]:
sales = pd.Series([100, 200, 100, 400], index = ['Jan', 'Feb', 'Mar','Apr'], name="4 month sales") 
print(sales)

Jan    100
Feb    200
Mar    100
Apr    400
Name: 4 month sales, dtype: int64


### create a pandas series to store a canteen data to hold values of how many sandwithes are sold each days (one week)

In [5]:
sw = pd.Series([20, 30, 20, 25, 30, 40, 0], 
               index = ['mon', 'tue', 'wed', 'thr','fri', 'sat', 'sun'])
print(sw)

mon    20
tue    30
wed    20
thr    25
fri    30
sat    40
sun     0
dtype: int64


In [6]:
print(sw[1])
print(sw["tue"])
print(sw["sun"])


30
30
0


In [7]:
print(sw[  [3,5] ])
print(sw[ [ "tue", "thr"] ])

thr    25
sat    40
dtype: int64
tue    30
thr    25
dtype: int64


### for providing multipe index for selecting element in an pd series, use array notation

In [8]:
print(sw > 20)
print()
print(sw[sw > 20])


mon    False
tue     True
wed    False
thr     True
fri     True
sat     True
sun    False
dtype: bool

tue    30
thr    25
fri    30
sat    40
dtype: int64


In [9]:
print(sw * 2)
sw = sw * 2
print(sw)

mon    40
tue    60
wed    40
thr    50
fri    60
sat    80
sun     0
dtype: int64
mon    40
tue    60
wed    40
thr    50
fri    60
sat    80
sun     0
dtype: int64


In [10]:
print(sw)

mon    40
tue    60
wed    40
thr    50
fri    60
sat    80
sun     0
dtype: int64


In [11]:
sw = sw /2
print(sw)

mon    20.0
tue    30.0
wed    20.0
thr    25.0
fri    30.0
sat    40.0
sun     0.0
dtype: float64


In [12]:
'mon' in sw

True

In [13]:
#taking input from numpy arrays
import numpy as np
ar = np.array([3,2,4,5,6])
ind = np.array( ['a', 'b', 'c', 'd', 'e'])

obj2 = pd.Series(ar, index = ind)
print(obj2)

a    3
b    2
c    4
d    5
e    6
dtype: int32


In [14]:
sdata = {"Sindh": 35000, "Panjab": 4500, "KPK": 3000, "Balochistan": 2000}
tax_by_state = pd.Series(sdata)
print(tax_by_state)
print(tax_by_state.index)

Sindh          35000
Panjab          4500
KPK             3000
Balochistan     2000
dtype: int64
Index(['Sindh', 'Panjab', 'KPK', 'Balochistan'], dtype='object')


In [15]:
sdata = {"Sindh": 35000, "Panjab": 45000, "KPK": 30000, "Balochistan": 20000}
tax_by_state = pd.Series(sdata, index = ["Panjab", "Sindh", "KPK", "Balochistan", "GB"])
print(tax_by_state)
print(pd.isnull(tax_by_state))

Panjab         45000.0
Sindh          35000.0
KPK            30000.0
Balochistan    20000.0
GB                 NaN
dtype: float64
Panjab         False
Sindh          False
KPK            False
Balochistan    False
GB              True
dtype: bool


In [16]:
tax_by_state.name= "state tax paying capicity"
tax_by_state.index.name = "states name"
print(tax_by_state)
print(tax_by_state.index)

states name
Panjab         45000.0
Sindh          35000.0
KPK            30000.0
Balochistan    20000.0
GB                 NaN
Name: state tax paying capicity, dtype: float64
Index(['Panjab', 'Sindh', 'KPK', 'Balochistan', 'GB'], dtype='object', name='states name')


In [17]:
# NaN values mean, values does not exist in pandas series
print(pd.isnull(tax_by_state))
tax_by_state.isnull()

states name
Panjab         False
Sindh          False
KPK            False
Balochistan    False
GB              True
Name: state tax paying capicity, dtype: bool


states name
Panjab         False
Sindh          False
KPK            False
Balochistan    False
GB              True
Name: state tax paying capicity, dtype: bool

In [18]:
sw = pd.Series([20, 30, 20, 25, 30, 40, 0], 
               index = ['mon', 'tue', 'wed', 'thr','fri', 'sat', 'sun'])
print(sw)
sw.index = ["m", "t", "w", "t", "f", "s", "s"]
print(sw)

mon    20
tue    30
wed    20
thr    25
fri    30
sat    40
sun     0
dtype: int64
m    20
t    30
w    20
t    25
f    30
s    40
s     0
dtype: int64


**Source for series data**
* direct data in the Series method
* from numpy array or list
* from dictonary

## Pandas DataFrame overview

A DataFrame represents a rectangular table of data and contains an ordered collection of columns, each of which can be a different value type (numeric, string, boolean, etc.). The DataFrame has both a row and column index

### DataFrame
![dataframe](./finallpandas.png)

In [19]:
apples = pd.Series([3,2,0,1])
oranges = pd.Series([3,4,7,8])

print(apples, oranges)
data = {"apples": apples, "oranges": oranges}
print(data)
fruits_df = pd.DataFrame(data)
print(fruits_df)

0    3
1    2
2    0
3    1
dtype: int64 0    3
1    4
2    7
3    8
dtype: int64
{'apples': 0    3
1    2
2    0
3    1
dtype: int64, 'oranges': 0    3
1    4
2    7
3    8
dtype: int64}
   apples  oranges
0       3        3
1       2        4
2       0        7
3       1        8


### keep in mind, Indexes

In [20]:
apples = pd.Series([3,2,0,1], ["a", "b", "c", "d"] )
oranges = pd.Series([3,2,0,1], index = ["mon", "tue", "wed", "thr"])

print(apples, oranges)
data = {"apples": apples, "oranges": oranges}
print(data)
fruits_df = pd.DataFrame(data)
print(fruits_df)
# index not matched

a    3
b    2
c    0
d    1
dtype: int64 mon    3
tue    2
wed    0
thr    1
dtype: int64
{'apples': a    3
b    2
c    0
d    1
dtype: int64, 'oranges': mon    3
tue    2
wed    0
thr    1
dtype: int64}
     apples  oranges
a       3.0      NaN
b       2.0      NaN
c       0.0      NaN
d       1.0      NaN
mon     NaN      3.0
thr     NaN      1.0
tue     NaN      2.0
wed     NaN      0.0


In [21]:
import pandas as pd

apples = pd.Series([3,2,0,1] , index = ["mon", "tue", "wed", "thr"] )
oranges = pd.Series([3,2,0,1], index = ["mon", "tue", "wed", "thr"])

print(apples,"\n", oranges)
data = {"apples": apples, "oranges": oranges}
print(data)
fruits_df = pd.DataFrame(data)
print(fruits_df)

mon    3
tue    2
wed    0
thr    1
dtype: int64 
 mon    3
tue    2
wed    0
thr    1
dtype: int64
{'apples': mon    3
tue    2
wed    0
thr    1
dtype: int64, 'oranges': mon    3
tue    2
wed    0
thr    1
dtype: int64}
     apples  oranges
mon       3        3
tue       2        2
wed       0        0
thr       1        1


In [22]:
state = ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Nevada']
data = {'state': state ,
        'year' : [2000, 2001, 2002, 2001, 2002, 2003],
        'pop'  : [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]}
state_pop_df = pd.DataFrame(data 
                     , index = ['1st', '2nd', '3rd', 
                                '4th', '5th','6th'] )
print(state_pop_df)

      state  year  pop
1st    Ohio  2000  1.5
2nd    Ohio  2001  1.7
3rd    Ohio  2002  3.6
4th  Nevada  2001  2.4
5th  Nevada  2002  2.9
6th  Nevada  2003  3.2


In [23]:
state_pop_df.head()

Unnamed: 0,state,year,pop
1st,Ohio,2000,1.5
2nd,Ohio,2001,1.7
3rd,Ohio,2002,3.6
4th,Nevada,2001,2.4
5th,Nevada,2002,2.9


In [24]:
state_pop_df =pd.DataFrame(data, columns=['year', 'state', 'pop'])
print(state_pop_df)

   year   state  pop
0  2000    Ohio  1.5
1  2001    Ohio  1.7
2  2002    Ohio  3.6
3  2001  Nevada  2.4
4  2002  Nevada  2.9
5  2003  Nevada  3.2


In [25]:
frame2 = pd.DataFrame(data, 
                      columns=['year', 'state', 'pop', 'debt'],
index=['one', 'two', 'three', 'four','five', 'six'])
frame2.head()

Unnamed: 0,year,state,pop,debt
one,2000,Ohio,1.5,
two,2001,Ohio,1.7,
three,2002,Ohio,3.6,
four,2001,Nevada,2.4,
five,2002,Nevada,2.9,


In [26]:
print(frame2.columns)
print( frame2.index )

Index(['year', 'state', 'pop', 'debt'], dtype='object')
Index(['one', 'two', 'three', 'four', 'five', 'six'], dtype='object')


In [27]:
frame2

Unnamed: 0,year,state,pop,debt
one,2000,Ohio,1.5,
two,2001,Ohio,1.7,
three,2002,Ohio,3.6,
four,2001,Nevada,2.4,
five,2002,Nevada,2.9,
six,2003,Nevada,3.2,


In [28]:
# printwhere state is Ohio
frame2[frame2["state"] == "Ohio"]

Unnamed: 0,year,state,pop,debt
one,2000,Ohio,1.5,
two,2001,Ohio,1.7,
three,2002,Ohio,3.6,


In [29]:
frame2["debt"] = 2.5
frame2

Unnamed: 0,year,state,pop,debt
one,2000,Ohio,1.5,2.5
two,2001,Ohio,1.7,2.5
three,2002,Ohio,3.6,2.5
four,2001,Nevada,2.4,2.5
five,2002,Nevada,2.9,2.5
six,2003,Nevada,3.2,2.5
