### Pandas 

* Pandas is a Python library used for working with data sets
* It has functions for analyzing, cleaning, exploring, and manipulating big data and make conclusions based on statistical theories
* It can clean messy data sets, and make them readable and relevant
* To work with pandas, we have to import library first like numpy
<pre>
import pandas as pd
</pre>

* In pandas, data is stored in the tabular form like the table, in an object called a DataFrame
* with pandas
  - Manipulation and analyzing becomes fast and efficient
  - Data from different file objects can be loaded
  - it is easy handling of missing data 
  - columns can be inserted and deleted from DataFrame and higher dimensional objects
  - data sets can be merged and joined
  - flexible reshaping and pivoting of data sets
  - time-series functionality can be provided
  - powerful group by functionality can be performed

------------------------------------------------------------------------
Pandas generally provide two data structures for manipulating data: 

- Series
- DataFrame

* **Pandas Series** is a one-dimensional labelled array, holding data of any type (integer, string, float, python objects, etc.). A Pandas Series is like a column in a table. It can be created from the lists, dictionary, and from a scalar value etc.
* **Pandas DataFrame** is a multi-dimensional data structure, i.e., data is aligned in a tabular form in rows and columns. It can be created from the lists, dictionary, and from a list of dictionaries, etc.

*Series is like a column, a DataFrame is the whole table*

In [1]:
import pandas as pd

#### 1. Series

After importing the pandas package as pd,

In [2]:
#Create a simple Pandas Series from a list

lst = [1, 7, 2, 4, 9]

series1 = pd.Series(lst)

print(series1)

0    1
1    7
2    2
3    4
4    9
dtype: int64


the values are labeled with their index numbers. 

First value has index 0, second value has index 1 and so on.

This label can be used to access a specified value.

In [3]:
# to access a specified value
print(series1[1])

7


In [4]:
# a key/value object, like a dictionary, when creating a Series

weight = {"day1": 42, "day2": 38, "day3": 39}  # The keys of the dictionary become the labels

series2 = pd.Series(weight)

print(series2)

day1    42
day2    38
day3    39
dtype: int64


In [5]:
print(series2['day2'])

38


#### 2. DataFrame 

we can create a DataFrame from the dictionary using **pd.DataFrame**

In [6]:
# Creating dataframe from dictionary

students = {
    'Amy':22, 
    'Ashley':21, 
    'Shreya':23
    }

In [7]:
# creating a Dataframe object from a list of tuples of key, value pair
df = pd.DataFrame(list(students.items()))
  
df

Unnamed: 0,0,1
0,Amy,22
1,Ashley,21
2,Shreya,23


Pandas assigned some automatic column labels, 0 and 1 <br>
To specify them manually, we can set the column attribute to a list with the correct labels

In [8]:
# with custom indexing
df = pd.DataFrame(list(students.items()), columns = ['Name', 'Age'])

df

Unnamed: 0,Name,Age
0,Amy,22
1,Ashley,21
2,Shreya,23


In [9]:
# dictionary with list object in values
student = {
    'Name' : ['Amy', 'Ashley', 'Shreya', 'Sugandh'],
    'Age' : [23, 21, 22, 21],
    'Course' : ['Data Analytics', 'UX', 'Cloud computing', 'Data Analytics'],
}

In [10]:
# creating a Dataframe object 
df = pd.DataFrame(student)

In [11]:
df

Unnamed: 0,Name,Age,Course
0,Amy,23,Data Analytics
1,Ashley,21,UX
2,Shreya,22,Cloud computing
3,Sugandh,21,Data Analytics


Pandas assigned some automatic row labels, 0 up to 3<br>
To specify them manually, we can set the index attribute to a list with the correct labels

In [12]:
# with custom indexing
df = pd.DataFrame(student, index = ['a', 'b', 'c', 'd'])

df

Unnamed: 0,Name,Age,Course
a,Amy,23,Data Analytics
b,Ashley,21,UX
c,Shreya,22,Cloud computing
d,Sugandh,21,Data Analytics


We can create a DataFrame from Dictionary with required columns only as well

In [13]:
# one column i.e skipping age column

df = pd.DataFrame(student, columns = ['Name', 'Course'])
  
df

Unnamed: 0,Name,Course
0,Amy,Data Analytics
1,Ashley,UX
2,Shreya,Cloud computing
3,Sugandh,Data Analytics


#### Creating DataFrame from a list

In [14]:
# list of strings
lst = ['Amy', 'Ashley', 'Shreya', 'Sugandh']
   
# Calling DataFrame constructor on list
df = pd.DataFrame(lst,columns=['Name'])
print(df)

      Name
0      Amy
1   Ashley
2   Shreya
3  Sugandh


In [15]:
# list of strings
lst = ['Amy', 'Ashley', 'Shreya', 'Sugandh']
  
# list of int
lst2 = [23, 21, 22, 21]
  
# by zipping method we can handle both lists, with columns specified
df = pd.DataFrame(list(zip(lst, lst2)),
               columns =['Name', 'Age'])
df

Unnamed: 0,Name,Age
0,Amy,23
1,Ashley,21
2,Shreya,22
3,Sugandh,21


In [16]:
# Creating DataFrame using multi-dimensional list

# Nested-List 
lst = [
       ['Amy',23],
       ['Ashley',21],
       ['Shreya',22],
       ['Sugandh',21]
       ]

df = pd.DataFrame(lst, columns =['Name', 'Age'])
df

Unnamed: 0,Name,Age
0,Amy,23
1,Ashley,21
2,Shreya,22
3,Sugandh,21


In [17]:
# Using multi-dimensional list with column name and dtype specified

lst = [
       ['Amy','Willson',23],
       ['Ashley','John',21],
       ['Shreya','Sharma',22],
       ['Sugandh','Bansal',21]
       ]
df = pd.DataFrame(lst, columns =['First Name', 'Last Name','Age'], dtype=float)
df


Unnamed: 0,First Name,Last Name,Age
0,Amy,Willson,23.0
1,Ashley,John,21.0
2,Shreya,Sharma,22.0
3,Sugandh,Bansal,21.0


In [18]:
# Using lists in dictionary to create dataframe

nme = ["aparna", "pankaj", "sudhir", "Geeku"]
deg = ["MBA", "BCA", "M.Tech", "MBA"]
scr = [90, 40, 80, 98]
  
# dictionary of lists 
dict = {'name': nme, 'degree': deg, 'score': scr} 
    
df = pd.DataFrame(dict)
    
df 

Unnamed: 0,name,degree,score
0,aparna,MBA,90
1,pankaj,BCA,40
2,sudhir,M.Tech,80
3,Geeku,MBA,98
