### Pandas.....
1. Pandas is an open-source Python Library providing high-performance data manipulation and analysis tool using its powerful data structures.

### Key Features of Pandas:
1. Fast and efficient DataFrame object with default and customized indexing.
2. Tools for loading data into in-memory data objects from different file formats.
3. Data alignment and integrated handling of missing data.
4. Reshaping and pivoting of date sets.
5. Label-based slicing, indexing and subsetting of large data sets.
6. Columns from a data structure can be deleted or inserted.
7. Group by data for aggregation and transformations.
8. High performance merging and joining of data.
9. Time Series functionality.

###  DataStructures :
1. Series: 1D labeled homogeneous array, sizeimmutable.
2. Data Frames:	General 2D labeled, size-mutable tabular structure with potentially heterogeneously typed columns.
3. Panel: General 3D labeled, size-mutable array.

###### Note  :−   DataFrame is widely used and one of the most important data structures. Panel is used much less.

#### Pandas - Series: 
1. Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index.

2. Series can be created using :
    1. Array
    2. Dict
    3. Scalar Value or constant
    
 

In [None]:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame

s = pd.Series([1,2,3])
print(s)

myArray =  np.array(['a','b','c','d','e'])
print(myArray)

s = pd.Series(myArray)
print(s)

s1 = Series([6,3,1,5,2,9])
print(s1)
s1.values
s1.index

In [None]:
cityPopulation_2016 = Series([876543,345678,270987,762423], index=["Kurnool","Kadapa","Anantapur","Chitoor"])
cityPopulation_2016

In [None]:
cityPopulation_2016["Kadapa"]

In [None]:
cityPopulation_2016[cityPopulation_2016 > 500000]

In [None]:
'Kurnool' in cityPopulation_2016

In [None]:
cities = ["Kurnool","Kadapa","Anantapur","Chitoor","Nellore"]

Ser =  Series(cityPopulation_2016, index=cities)

print(Ser)

In [None]:
# adding two series

newSer = Ser + cityPopulation_2016
print(newSer)

In [None]:
# Nameing the series

newSer.name = "Rayalaseema City Population"

# nameing the index 
newSer.index.name = "City"

print(newSer)

###  Dataframes:


In [None]:
df=pd.read_csv("/Users/Sivaram/Desktop/Learning/Training/Kelly/xlsx/Salaries.csv")

print("Head Data: \n ", df.head())

print("Tail Data: \n", df.tail())


In [None]:
df.info()

In [None]:
# get first five values of single column

df['last_name'].head()

In [None]:
# adding a new column to dataframe
DataFrame(df,columns=['last_name','first_name','Age'])


In [None]:
# get 4th record from dataframe

df.iloc[4]

In [None]:
# Add new column double salary using base salary
df['Double Salary'] = df['base_salary']*2
df.head()

In [None]:
# Add new column with default value
df['Taxes'] = "Basic Tax"
df.head()

In [None]:
# adding values to the new column based on indexes
tax_index = Series(["Central Tax", "State Tax"], index=[0,6])
df['Taxes'] = tax_index
df.head(10)

In [None]:
# deleting the column 'Taxes'
del df['Taxes']


In [None]:
df


In [None]:
## Dataframe from a dictionary

Myfruits = {'fruits':['Orange','Mango','Grapes','Guava'],'Price' :[20,50,90,100]}
dfFruits = DataFrame(Myfruits)
dfFruits

In [None]:
## indexing

df = Series([6,4,5,9,7],index=['Red','Blue','Green','Brown','Violet'])
df

In [None]:
### Reindexing
df.reindex(['Red','Blue','Green','Brown','Violet', 'Pink'])


In [None]:
df.reindex(['Red','Blue','Green','Brown','Violet','Pink'],fill_value=10)


In [None]:
df

In [None]:
## Reindexing rows, columns or both

dframe = DataFrame(np.random.randint(25,size=25).reshape((5,5)),index=['A','B','D','E','F'],
                   columns=['col1','col2','col3','col4','col5'])

dframe

In [None]:
## Summing up columns and rows¶
print(df.sum(axis=1))  ## sum of rows
print(df.sum(axis=0))  ## sum of columns

In [None]:
df.describe()

In [None]:
corr = df.corr()

In [None]:
### Plot the Correlation heatmap matrix using Seaborn – A Data Visualization Library¶
import seaborn as sns
%matplotlib inline
sns.heatmap(corr)

In [None]:
###  Missing data in dataframes
import pandas as pd
df = DataFrame([[5,4,3,2,7],[6,3,np.NaN,np.NAN,9],[np.NaN,3,4,2,np.NaN]])
df


In [None]:
## drop null values

df1 = df.dropna(axis=1)
df1

In [None]:
## drop na where row has 3 values
df2 = df.dropna(thresh=3)
df2

In [None]:
### fill nan values with some value
df.fillna('5')

In [None]:
### Drop rows¶

df.drop(1)

In [None]:
dframe = DataFrame(np.random.randint(25,size=25).reshape((5,5)),index=['A','B','D','E','F'],
                   columns=['col1','col2','col3','col4','col5'])

In [None]:
### Selecting only 2 columns¶

print(dframe[['col1','col3']])
print(dframe)

In [None]:
### selection based on index
## use .loc for selection based on row name
## use .iloc for selection based on index of row

print(dframe)
print(dframe.loc['B'])
print(dframe.iloc[2])


In [None]:
## Selection on condition
dframe[dframe['col1'] > 10]

In [None]:
### selecting values
df=Series([6,4,3,2],index=['A','B','C','D'])

df

In [None]:
print("The value @ c: ", df['C'])

print("The 0th and 1st row values: \n" ,df[0:2])



In [None]:
### Sorting
df=Series([6,4,2,1],index=['P','R','S','Q'])

df

In [None]:
df.sort_index() # sort by index


In [None]:
df.sort_values() ## sort by values


In [None]:
df.rank() # sort ny rank


In [None]:
### MultiIndexing¶

Sr = Series([5,4,3,7,9,6],index=[[1,1,2,2,3,3],['a','b','a','b','a','b']])
Sr

In [None]:
## verifying indexing levels
Sr.index


In [None]:
####  Get Values by first Index¶

Sr[3]

In [None]:
### Get Values by Second Index
Sr[:, 'b']

In [None]:
Sr.unstack()


In [None]:
### Group by

df_csv = pd.read_csv("/Users/Sivaram/Desktop/Learning/Training/Kelly/xlsx/Salaries.csv")
df_csv.head()

In [None]:
df_csv.groupby(['club','position']).count().sort_values(by=['base_salary'],ascending=[False]).head(5)
