# Pandas Notes

- Provides 2 data structures for manipulation : dataframe and series
- Series.:
    - 1D data structure
    - axis label is called indexes
- Dataframe:
    - 2D data structure
    - similar to excel sheet
    - ![image.png](attachment:image.png)

#### Creating DataFrames

- Syntax: `pandas.DataFrame(data, index, columns)`
    - Parameters: 
        - data: It is a dataset from which a DataFrame is to be created. It can be a list, dictionary, scalar value, series, and arrays, etc. 
        - index: It is optional, by default the index of the DataFrame starts from 0 and ends at the last data value(n-1). It defines the row label explicitly. 
        - columns: This parameter is used to provide column names in the DataFrame. If the column name is not defined by default, it will take a value from 0 to n-1. 
    - Returns: DataFrame object



In [3]:
import pandas as pd

df = pd.DataFrame()
print(df)

Empty DataFrame
Columns: []
Index: []


In [19]:
#From List
lst = ['Hello','World',1,2]
df = pd.DataFrame(lst,columns=['Col1']) #without col name indexed as 0
print(df)
print()

#nested lists
lsts = [['a','b','c'],['d','e','f']]
df1 = pd.DataFrame(lsts)
print(df1)

    Col1
0  Hello
1  World
2      1
3      2

   0  1  2
0  a  b  c
1  d  e  f


In [16]:
# from dict
mapping = {'Hello':['World'],1:[2],'Creating df':['from dictionary']}
df = pd.DataFrame(mapping)
#keys are col-names ; values are col entries.
df

Unnamed: 0,Hello,1,Creating df
0,World,2,from dictionary


In [20]:
#lists of dictionary

# Initialize data to lists.
data = [{'a': 1, 'b': 2, 'c': 3},
        {'a': 10, 'b': 20, 'c': 30}]

# Creates DataFrame.
df = pd.DataFrame(data)

print(df)


    a   b   c
0   1   2   3
1  10  20  30


In [22]:
import pandas as pd

# Initialize data to lists.
data = [{'a': 1, 'b': 2, 'c': 3},
        {'a': 10, 'b': 20}]

# Creates DataFrame.
df = pd.DataFrame(data,index=['row1','row2'])

print(df)


       a   b    c
row1   1   2  3.0
row2  10  20  NaN


Other ways of converting `list of dicts --> pd.df` include:
- Using from_records()
- Using pd.DataFrame.from_dict()
- Using pd.json_normalize

In [37]:
# Initialise data to lists.  
data = [{'Geeks': 'dataframe', 'For': 'using', 'geeks': 'list'}, 
        {'Geeks':10, 'For': 20, 'geeks': 30}]  
  
df = pd.DataFrame.from_records(data,index=['1', '2'])
print(df)
print()

df = pd.DataFrame.from_dict(data)
print(df)
print()

df = pd.json_normalize(data)
print(df)

       Geeks    For geeks
1  dataframe  using  list
2         10     20    30

       Geeks    For geeks
0  dataframe  using  list
1         10     20    30

       Geeks    For geeks
0  dataframe  using  list
1         10     20    30


In [23]:
#creating df from another df
original_df = pd.DataFrame({
    'Name': ['Tom', 'Nick', 'Krish', 'Jack'],
    'Age': [20, 21, 19, 18]
})

new_df = original_df[['Name']] 
print(new_df)


    Name
0    Tom
1   Nick
2  Krish
3   Jack


In [29]:
ser = pd.Series(lst)
df = pd.DataFrame(ser)
df #Same applies to dicts of series

# Initialize data to Dicts of series.
d = {'one': pd.Series([10, 20, 30, 40],
                      index=['a', 'b', 'c', 'd']),
     'two': pd.Series([10, 20, 30],
                      index=['a', 'b', 'c'])}

# creates Dataframe.
df = pd.DataFrame(d)

print(df)


   one   two
a   10  10.0
b   20  20.0
c   30  30.0
d   40   NaN


In [35]:
# Using Zip
# Zip 2 lists and give as input to df

# List1
Name = ['tom', 'krish', 'nick', 'juli']

# List2
Age = [25, 30, 26, 22]

# get the list of tuples from two lists.
# and merge them by using zip().
list_of_tuples = list(zip(Name, Age))

list_of_tuples



[('tom', 25), ('krish', 30), ('nick', 26), ('juli', 22)]

In [34]:
df = pd.DataFrame(list_of_tuples)
df

Unnamed: 0,0,1
0,tom,25
1,krish,30
2,nick,26
3,juli,22


##### Common Methods for df
- head():  Returns the first n rows. 
- tail():  Returns the last n rows. 
- info():  Provides a summary of the DataFrame. 
- describe():  Generates descriptive statistics. 
- sort_values():  Sorts the DataFrame by specified columns. 
- groupby():  Groups the DataFrame using a mapper or by series of columns.
- merge():  Merges DataFrame or named series objects with a database-style join. 
- apply():  Applies a function along the axis of the DataFrame. 
- drop():  Removes specified labels from rows or columns. 
- pivot_table():  Creates a pivot table. 
- fillna(): Fills NA/NaN values. 
- isnull():  Detects missing values.
- notnull(): prints non null rows in df

Unnamed: 0,1
count,4.0
mean,25.75
std,3.304038
min,22.0
25%,24.25
50%,25.5
75%,27.0
max,30.0
