#Welcome to Pandas Tutorial

Open source library that provides data structures like series and Dataframe. It is like a spreadsheet but for python so that it can easily clean, transform and analyze large datasets.
*   Offers Data Transformation, aggregation and visualization.




#Pandas basics
Data Structures:
*   Series; 1-D labelled array which can hold any data type.
*   Dataframe: 2-D data structure constructed with row and column.

#Read and Write to Csv in Pandas


In [None]:
import pandas as pd

df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}) #Creating a DataFrame.
print(df)
df.to_csv('abc.csv')


   a  b
0  1  4
1  2  5
2  3  6


In [None]:
newdf=pd.read_csv('abc.csv') #Reads from CSV file. There are other commands like read_json(), read_sql(), and read_html()
print(newdf)
anotherdf = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}, index=['First', 'Second', 'Third'])
print(anotherdf) #Notice that when dataframe is extracted from csv file, one extra column is added.


series = pd.Series(data = [1, 2, 3, 4, 5], index = ['A', 'B', 'C', 'D', 'E']) #Creating a Series.
anotherSeries = pd.Series(data = [1, 2, 3, 4, 5])
print(anotherSeries)
print(series)

#ADDITIONAL FEATURES OF SERIES AND DATAFRAME
series.dtypes          #Get type of data
print(series.shape)     # Gets shape of the data.
print(newdf.shape)

newdf.info()        #information about the dataFrame/Series
newdf.values      #returns values only in the the form of numpy array








   Unnamed: 0  a  b
0           0  1  4
1           1  2  5
2           2  3  6
        a  b
First   1  4
Second  2  5
Third   3  6
0    1
1    2
2    3
3    4
4    5
dtype: int64
A    1
B    2
C    3
D    4
E    5
dtype: int64
(5,)
(3, 3)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype
---  ------      --------------  -----
 0   Unnamed: 0  3 non-null      int64
 1   a           3 non-null      int64
 2   b           3 non-null      int64
dtypes: int64(3)
memory usage: 204.0 bytes


array([[0, 1, 4],
       [1, 2, 5],
       [2, 3, 6]])

#Sorting, ReIndexing, Renaming, Reshaping, Dropping:


In [None]:
data = {'Fruits': ['Mango', 'Apple', 'Banana', 'Orange'],
        'Quantity': [40, 20, 25, 10],
        'Price': [80, 100, 50, 70]
        }
df = pd.DataFrame(data) # New dataFrame
print(df)


print(df.sort_values('Price', ascending=True)) #True for ascending, False for descennding

print(df.sort_index(ascending=False))

df.rename(columns={
    'Fruits': 'FRUIT',
    'Quantity': 'QUANTITY',
    'Price': 'PRICE'
}, inplace=True)          ## Inplace makes changes to original data frame.

print(df)

print(df.rename(index={
    0: 'First',
    1: 'Second',
    2: 'Third',
    3: 'Fourth'
}))
pd.melt(df)           # Gathers columns into rows.

pivot = df.pivot(columns='FRUIT', values=['QUANTITY', 'PRICE'])
print(pivot)        #Creates a pivot table.

df1 = df.drop(columns=['FRUIT'], axis=1)
print(df1)





   Fruits  Quantity  Price
0   Mango        40     80
1   Apple        20    100
2  Banana        25     50
3  Orange        10     70
   Fruits  Quantity  Price
2  Banana        25     50
3  Orange        10     70
0   Mango        40     80
1   Apple        20    100
   Fruits  Quantity  Price
3  Orange        10     70
2  Banana        25     50
1   Apple        20    100
0   Mango        40     80
    FRUIT  QUANTITY  PRICE
0   Mango        40     80
1   Apple        20    100
2  Banana        25     50
3  Orange        10     70
         FRUIT  QUANTITY  PRICE
First    Mango        40     80
Second   Apple        20    100
Third   Banana        25     50
Fourth  Orange        10     70
      QUANTITY                      PRICE                    
FRUIT    Apple Banana Mango Orange  Apple Banana Mango Orange
0          NaN    NaN  40.0    NaN    NaN    NaN  80.0    NaN
1         20.0    NaN   NaN    NaN  100.0    NaN   NaN    NaN
2          NaN   25.0   NaN    NaN    NaN   50.0   N

#Data observations:

In [None]:
print(df.head())             #First 5
print(df.tail())            #Last 5
print(df.sample(2))          #Randomly selects n rows from the dataFrame
print(df[df.PRICE > 50])
print(df['FRUIT'])


    FRUIT  QUANTITY  PRICE
0   Mango        40     80
1   Apple        20    100
2  Banana        25     50
3  Orange        10     70
    FRUIT  QUANTITY  PRICE
0   Mango        40     80
1   Apple        20    100
2  Banana        25     50
3  Orange        10     70
    FRUIT  QUANTITY  PRICE
2  Banana        25     50
3  Orange        10     70
    FRUIT  QUANTITY  PRICE
0   Mango        40     80
1   Apple        20    100
3  Orange        10     70
0     Mango
1     Apple
2    Banana
3    Orange
Name: FRUIT, dtype: object


# DATA Correlation

* corr() method: Calculates correlation between each column in data set.

* Can only be used when out df has numerical values. It results in a floating value between -1 to 1.

* 1 means that there is a 1 to 1 relationship (a perfect correlation), and for this data set, each time a value went up in the first column, the other one went up as well.

* -0.9 would be just as good relationship as 0.9, but if you increase one value, the other will probably go down.

* 0 means no correlation.



In [None]:
df[["QUANTITY", "PRICE"]].corr()


Unnamed: 0,QUANTITY,PRICE
QUANTITY,1.0,0.032026
PRICE,0.032026,1.0


#PLOT
plot is pandas attribute to create diagrams. Matplotlib has pyplot which is used to visualize the diagrams. pyplot also has plot attribute.

