# Pandas Tutorials

> - Pandas is used for Processing (load, manipulate, prepare, model, and analyze) the given data. <br>
> - Pandas is built on top of the Numpy package so Numpy is required to work with Pandas <br>
> - Pandas has 2 data structures for processing the data.
> > 1. Series    --> is a one-dimensional array that is capable of storing various data types.
> > 2. DataFrame --> is a two-dimensional array with labeled axes (rows and columns)

# Import required modules

In [2]:
import numpy as np
import pandas as pd

# A) Series in Pandas

In [21]:
# Series a one-dimensional array that is capable of storing various data types
# The row labels of series are called the index.
# Series is capable of holding Integers, strings, floating point numbers, Python objects.
# It should not contains multiple columns.

print("Series Example")
s = pd.Series(np.arange(15,20))
s

Series Example


0    15
1    16
2    17
3    18
4    19
dtype: int32

### A1) Series from ndarray

In [39]:
print("Series From ndarray. \n\nSeries:")

s = pd.Series(np.random.randn(5), index=['a','b','c','d','e'])
print(s)

Series From ndarray. 

Series:
a   -1.140473
b    0.327533
c   -1.009269
d   -0.430283
e   -0.355748
dtype: float64


### A2) Series from Dictionary

In [38]:
print("Series From Dictionary. \n\nSeries:")

dict1 = {'p':111, 'q':222, 'r':333, 's':np.NaN, 't':555}
s = pd.Series(dict1)
print(s)

Series From Dictionary. 

Series:
p    111.0
q    222.0
r    333.0
s      NaN
t    555.0
dtype: float64


### A3) Series from Scalar value

In [41]:
print("Series From Scalar value. Same value will be repeated with index length. \n\nSeries:")

s = pd.Series(125, index=['i','j','k','l'])
print(s)

Series From Scalar value. Same value will be repeated with index length. 

Series:
i    125
j    125
k    125
l    125
dtype: int64


### A4) Series functionalities

In [45]:
print("Series will work like ndarray. Slice operations \n\nSeries:")

dict1 = {'p':111, 'q':222, 'r':333, 's':np.NaN, 't':555}
s = pd.Series(dict1)

print(s)

Series will work like ndarray. Slice operations 

Series:
p    111.0
q    222.0
r    333.0
s      NaN
t    555.0
dtype: float64


In [61]:
print("Slice:\ns[1]:",s[1])
print("s['r']:",s['r'])

print("\n##############################")
print("Filters:s[s > 200]:\n", s[s > 200])

print("\n##############################")
print("Select Multiple indexes:s[0,2,4]:\n", s[[0,2,4]])

print("\n##############################")
print("Check DType:", s.dtype)



Slice:
s[1]: 222.0
s['r']: 333.0

##############################
Filters:s[s > 200]:
 q    222.0
r    333.0
t    555.0
dtype: float64

##############################
Select Multiple indexes:s[0,2,4]:
 p    111.0
r    333.0
t    555.0
dtype: float64

##############################
Check DType: float64


In [64]:
print("\n##############################")
print("Sum 2 Series: s + s :\n", s + s)

print("\n##############################")
print("Multiply by 5 : s*5 :\n", s*5)


##############################
Sum 2 Series: s + s :
 p     222.0
q     444.0
r     666.0
s       NaN
t    1110.0
dtype: float64

##############################
Multiply by 5 : s*5 :
 p     555.0
q    1110.0
r    1665.0
s       NaN
t    2775.0
dtype: float64


# B) DataFrame in Pandas

In [134]:
# DataFrame is a two-dimensional array with labeled axes (rows and columns).
# DataFrame is like Structured table or Excel file
df = pd.DataFrame(np.random.randn(5,4),index='A B C D E'.split(),columns='W X Y Z'.split())
df

Unnamed: 0,W,X,Y,Z
A,-1.003719,-1.354529,0.23504,0.389252
B,-2.318651,-0.456101,0.01288,-1.429024
C,-0.178227,1.669811,-0.570908,0.373214
D,0.397618,-0.942842,1.867151,-1.172789
E,-1.396274,0.702433,-1.543946,-1.039696


### B1) DataFrame from List

In [74]:
print("DataFrame Example using List:\n DataFrame:")

l1 = [2,3,4,5,6,7]
df = pd.DataFrame(l1, index = ['a','b','c','d','e','f'], columns = ['ID_NUM'])
df

DataFrame Example using List:
 DataFrame:


Unnamed: 0,ID_NUM
a,2
b,3
c,4
d,5
e,6
f,7


### B2) DataFrame from Dict

In [76]:
print("DataFrame Example using Dict:\n DataFrame:")

dict1 = {"ID":[101,102,103,104,105], "Name":['AAA','BBB','CCC','DDD','EEE']}
df = pd.DataFrame(dict1)
df

DataFrame Example using Dict:
 DataFrame:


Unnamed: 0,ID,Name
0,101,AAA
1,102,BBB
2,103,CCC
3,104,DDD
4,105,EEE


### B3) DataFrame from ndarray

In [136]:
print("DataFrame Example using Dict:\n DataFrame:")

a = np.array(np.random.rand(10,5))
df = pd.DataFrame(a, index = [np.arange(2000,2010)], columns = ['India', 'USA', 'China', 'Japan', 'Italy'])
df

DataFrame Example using Dict:
 DataFrame:


Unnamed: 0,India,USA,China,Japan,Italy
2000,0.35029,0.346292,0.096251,0.525595,0.110623
2001,0.915823,0.404017,0.370374,0.360151,0.609577
2002,0.493764,0.048177,0.232806,0.579379,0.402962
2003,0.340133,0.853572,0.694253,0.09346,0.409661
2004,0.621391,0.791143,0.780259,0.710771,0.64659
2005,0.013221,0.796848,0.01163,0.557202,0.393944
2006,0.878574,0.592034,0.266376,0.668363,0.242256
2007,0.363322,0.100547,0.721351,0.829981,0.859235
2008,0.788926,0.191501,0.716643,0.103029,0.197034
2009,0.503605,0.053783,0.313549,0.118274,0.834089


### B4) DataFrame Basic functions

In [102]:
print("DataFrame count: \n\n", df.count())

DataFrame count: 

 India    10
USA      10
China    10
Japan    10
Italy    10
dtype: int64


In [103]:
print("DataFrame Columns list: \n\n", df.columns)

DataFrame Columns list: 

 Index(['India', 'USA', 'China', 'Japan', 'Italy'], dtype='object')


In [130]:
print("DataFrame index (Row Name)list: \n\n", df.index)

DataFrame index (Row Name)list: 

 MultiIndex(levels=[[2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009]],
           labels=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])


In [108]:
print("DataFrame Shape(Rows X Columns): \n\n", df.shape)

DataFrame Shape(Rows X Columns): 

 (10, 5)


In [129]:
print("DataFrame DataTypes of each column.\n")
df.dtypes

DataFrame DataTypes of each column.



India    float64
USA      float64
China    float64
Japan    float64
Italy    float64
dtype: object

In [112]:
print("DataFrame Information: \n")
df.info()

DataFrame Information: 

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 10 entries, (2000,) to (2009,)
Data columns (total 5 columns):
India    10 non-null float64
USA      10 non-null float64
China    10 non-null float64
Japan    10 non-null float64
Italy    10 non-null float64
dtypes: float64(5)
memory usage: 506.0 bytes


In [113]:
print("DataFrame Description: \n")
df.describe()

DataFrame Description: 



Unnamed: 0,India,USA,China,Japan,Italy
count,10.0,10.0,10.0,10.0,10.0
mean,0.478931,0.483407,0.400916,0.357733,0.49002
std,0.300466,0.258164,0.256427,0.350324,0.254655
min,0.013015,0.108695,0.026165,0.011144,0.024486
25%,0.262154,0.33304,0.241592,0.042462,0.360462
50%,0.440756,0.472644,0.376786,0.29639,0.482053
75%,0.75288,0.713159,0.602218,0.60405,0.653029
max,0.887407,0.833094,0.758114,0.881438,0.847797


In [124]:
print("DataFrame Sample Data: Given 3 records as Sample.\n")
df.sample(3) 

DataFrame Sample Data: Given 3 records as Sample.



Unnamed: 0,India,USA,China,Japan,Italy
2005,0.368619,0.332193,0.758114,0.422307,0.360263
2002,0.384851,0.374839,0.026165,0.439566,0.41668
2006,0.765961,0.833094,0.405134,0.029272,0.557857


In [125]:
print("DataFrame Pick top 3 Records.\n")
df.head(3) 

DataFrame Pick top 3 Records.



Unnamed: 0,India,USA,China,Japan,Italy
2000,0.787045,0.749275,0.097031,0.881438,0.821116
2001,0.887407,0.108695,0.748217,0.011144,0.547425
2002,0.384851,0.374839,0.026165,0.439566,0.41668


In [127]:
print("DataFrame Pick bottom 3 Records.\n")
df.tail(3) 

DataFrame Pick bottom 3 Records.



Unnamed: 0,India,USA,China,Japan,Italy
2007,0.013015,0.335582,0.655592,0.658877,0.278767
2008,0.49666,0.570448,0.309379,0.170474,0.361058
2009,0.226666,0.604811,0.348438,0.876197,0.684754


### B5) DataFrame Columns with Select, Create, Rename, drop
> ###  Axis = 0 --> for Rows
> ###  Axis = 1 --> for Columns

In [3]:
print("DataFrame Data Lookslike below. DataFrame DF:\n")

a = np.array(np.random.rand(10,5))
df = pd.DataFrame(a, index = [np.arange(2000,2010)], columns = ['India', 'USA', 'China', 'Japan', 'Italy'])
df

DataFrame Data Lookslike below. DataFrame DF:



Unnamed: 0,India,USA,China,Japan,Italy
2000,0.789466,0.871106,0.397255,0.430093,0.122103
2001,0.17192,0.571786,0.034432,0.858819,0.455361
2002,0.726382,0.059431,0.349691,0.825809,0.744944
2003,0.394756,0.744608,0.739636,0.881992,0.577439
2004,0.873255,0.735816,0.297974,0.161205,0.259532
2005,0.133222,0.070494,0.987739,0.805101,0.402493
2006,0.164766,0.982302,0.564084,0.608949,0.1376
2007,0.594084,0.783697,0.528061,0.682777,0.651184
2008,0.421628,0.971059,0.862977,0.123983,0.625146
2009,0.120628,0.550608,0.24349,0.364064,0.713067


#### B5a) Select Columns

In [140]:
#Select only perticular column
print("Select only India column")
df['India']

Select only India column


2000    0.399694
2001    0.619891
2002    0.344944
2003    0.962349
2004    0.859724
2005    0.787853
2006    0.615568
2007    0.882948
2008    0.521790
2009    0.628223
Name: India, dtype: float64

In [143]:
#check the type of each column: DataFrame is colletion of Series.
print("Type of India column:")
type(df['India'])

Type of India column:


pandas.core.series.Series

In [141]:
#Select list of columns

print("Select only India, USA columns")
df[['India', 'USA']]

Select only India, USA columns


Unnamed: 0,India,USA
2000,0.399694,0.322446
2001,0.619891,0.418961
2002,0.344944,0.090686
2003,0.962349,0.77866
2004,0.859724,0.163855
2005,0.787853,0.108535
2006,0.615568,0.476624
2007,0.882948,0.895825
2008,0.52179,0.097142
2009,0.628223,0.021361


#### B5b) Create New columns

In [150]:
print("Create India+USA columnsdata as IND_USA")
df['IND_USA'] = df['India'] + df['USA']
df

Create India+USA columnsdata as IND_USA


Unnamed: 0,India,USA,China,Japan,Italy,IND_USA
2000,0.399694,0.322446,0.949832,0.477198,0.108064,0.72214
2001,0.619891,0.418961,0.866907,0.342787,0.545301,1.038853
2002,0.344944,0.090686,0.114429,0.015913,0.025554,0.435631
2003,0.962349,0.77866,0.244048,0.399331,0.613096,1.741009
2004,0.859724,0.163855,0.693371,0.818648,0.967456,1.023578
2005,0.787853,0.108535,0.568081,0.566507,0.167816,0.896388
2006,0.615568,0.476624,0.894124,0.565522,0.9303,1.092192
2007,0.882948,0.895825,0.473075,0.590638,0.769765,1.778772
2008,0.52179,0.097142,0.189303,0.039195,0.104441,0.618932
2009,0.628223,0.021361,0.917344,0.207355,0.868885,0.649584


#### B5c) Rename columns

In [151]:
print("Rename IND_USA column as IND_puls_USA")
df = df.rename(columns={'IND_USA':'IND_puls_USA'})
df

Rename IND_USA column as IND_puls_USA


Unnamed: 0,India,USA,China,Japan,Italy,IND_puls_USA
2000,0.399694,0.322446,0.949832,0.477198,0.108064,0.72214
2001,0.619891,0.418961,0.866907,0.342787,0.545301,1.038853
2002,0.344944,0.090686,0.114429,0.015913,0.025554,0.435631
2003,0.962349,0.77866,0.244048,0.399331,0.613096,1.741009
2004,0.859724,0.163855,0.693371,0.818648,0.967456,1.023578
2005,0.787853,0.108535,0.568081,0.566507,0.167816,0.896388
2006,0.615568,0.476624,0.894124,0.565522,0.9303,1.092192
2007,0.882948,0.895825,0.473075,0.590638,0.769765,1.778772
2008,0.52179,0.097142,0.189303,0.039195,0.104441,0.618932
2009,0.628223,0.021361,0.917344,0.207355,0.868885,0.649584


#### B5d) Drop existing columns

In [152]:
print("Drop IND_puls_USA columns from DataFrame. Axis=1 for columns.")
df = df.drop('IND_puls_USA', axis=1)
df

Drop IND_puls_USA columns from DataFrame. Axis=1 for columns.


Unnamed: 0,India,USA,China,Japan,Italy
2000,0.399694,0.322446,0.949832,0.477198,0.108064
2001,0.619891,0.418961,0.866907,0.342787,0.545301
2002,0.344944,0.090686,0.114429,0.015913,0.025554
2003,0.962349,0.77866,0.244048,0.399331,0.613096
2004,0.859724,0.163855,0.693371,0.818648,0.967456
2005,0.787853,0.108535,0.568081,0.566507,0.167816
2006,0.615568,0.476624,0.894124,0.565522,0.9303
2007,0.882948,0.895825,0.473075,0.590638,0.769765
2008,0.52179,0.097142,0.189303,0.039195,0.104441
2009,0.628223,0.021361,0.917344,0.207355,0.868885


### B6) DataFrame Rows with Select, Create, Rename, drop
> ####  Axis = 0 --> for Rows, Axis = 1 --> for Columns

> ###  Row selection:
     >> #### loc() --> used when we know index name for a perticular row
     >> #### iloc() --> used when we dont know the name of index, but we know index order value

In [23]:
print("DataFrame Data Lookslike below. DataFrame DF:\n")

a = np.array(np.random.rand(10,5))
df = pd.DataFrame(a, index = np.arange(2000,2010), columns = ['India', 'USA', 'China', 'Japan', 'Italy'])
df

DataFrame Data Lookslike below. DataFrame DF:



Unnamed: 0,India,USA,China,Japan,Italy
2000,0.7204,0.804348,0.990995,0.371329,0.339612
2001,0.258122,0.354631,0.425806,0.225605,0.821841
2002,0.034187,0.582727,0.11132,0.747159,0.44848
2003,0.791072,0.208659,0.942088,0.854797,0.258203
2004,0.256141,0.731758,0.882233,0.725983,0.274646
2005,0.703782,0.359889,0.542465,0.002443,0.9812
2006,0.354151,0.024519,0.446111,0.127251,0.017002
2007,0.484173,0.534549,0.386819,0.42577,0.849953
2008,0.690787,0.433837,0.116532,0.992407,0.016669
2009,0.060025,0.444211,0.406957,0.579171,0.801691


#### B6a) Select rows by using loc() or iloc() 

In [29]:
#select rows based on loc(). Because I know there is a index name with 2005 in my df. I'll use index name.
print("Examle to loc() when we know index name, Ex: 2005")

df.loc[2005]


Examle to loc() when we know index name, Ex: 2005


India    0.703782
USA      0.359889
China    0.542465
Japan    0.002443
Italy    0.981200
Name: 2005, dtype: float64

In [30]:
print("Examle to loc() when we know index name, Ex: 2005 and 2007")

df.loc[[2005, 2007]]

Examle to loc() when we know index name, Ex: 2005 and 2007


Unnamed: 0,India,USA,China,Japan,Italy
2005,0.703782,0.359889,0.542465,0.002443,0.9812
2007,0.484173,0.534549,0.386819,0.42577,0.849953


In [31]:
print("Examle to loc() when we know index name, Ex: 2005, 2007 for only India and US columns")

df.loc[[2005, 2007], ['India', 'USA']]

Examle to loc() when we know index name, Ex: 2005, 2007 for only India and US columns


Unnamed: 0,India,USA
2005,0.703782,0.359889
2007,0.484173,0.534549


In [32]:
print("Examle to iloc() when we know index order value, Ex: 0-->2000, 1-->2001, 2-->2002, etc")
print("Select 1st record in df.")

df.iloc[0]


Examle to iloc() when we know index order value, Ex: 0-->2000, 1-->2001, 2-->2002, etc
Select 1st record in df.


India    0.720400
USA      0.804348
China    0.990995
Japan    0.371329
Italy    0.339612
Name: 2000, dtype: float64

In [44]:
print("Select 1st,3rd records in df. i.e. 2000, 2002 indexed rows.")

df.iloc[[0,2]]

Select 1st,3rd records in df. i.e. 2000, 2002 indexed rows.


Unnamed: 0,India,USA,China,Japan,Italy
2000,0.7204,0.804348,0.990995,0.371329,0.339612
2002,0.034187,0.582727,0.11132,0.747159,0.44848


In [45]:
print("Select 1st,3rd records in df. i.e. 2000, 2002 indexed rows with colummns India and USA only.")

df.iloc[[0, 2], [0,1]]
#df.iloc[:3, 1:3]

Select 1st,3rd records in df. i.e. 2000, 2002 indexed rows with colummns India and USA only.


Unnamed: 0,India,USA
2000,0.7204,0.804348
2002,0.034187,0.582727


#### B6b) Create new row

In [49]:
print("Method1: using loc() add 2010 row")

df.loc[2010] = np.random.rand(5)
df

Method1: using loc() add 2010 row


Unnamed: 0,India,USA,China,Japan,Italy
2000,0.7204,0.804348,0.990995,0.371329,0.339612
2001,0.258122,0.354631,0.425806,0.225605,0.821841
2002,0.034187,0.582727,0.11132,0.747159,0.44848
2003,0.791072,0.208659,0.942088,0.854797,0.258203
2004,0.256141,0.731758,0.882233,0.725983,0.274646
2005,0.703782,0.359889,0.542465,0.002443,0.9812
2006,0.354151,0.024519,0.446111,0.127251,0.017002
2007,0.484173,0.534549,0.386819,0.42577,0.849953
2008,0.690787,0.433837,0.116532,0.992407,0.016669
2009,0.060025,0.444211,0.406957,0.579171,0.801691


In [51]:
print("Method2: using iloc() add 2010 row")

df.iloc[11] = np.random.rand(5)
df

Method2: using iloc() add 2010 row


Unnamed: 0,India,USA,China,Japan,Italy
2000,0.7204,0.804348,0.990995,0.371329,0.339612
2001,0.258122,0.354631,0.425806,0.225605,0.821841
2002,0.034187,0.582727,0.11132,0.747159,0.44848
2003,0.791072,0.208659,0.942088,0.854797,0.258203
2004,0.256141,0.731758,0.882233,0.725983,0.274646
2005,0.703782,0.359889,0.542465,0.002443,0.9812
2006,0.354151,0.024519,0.446111,0.127251,0.017002
2007,0.484173,0.534549,0.386819,0.42577,0.849953
2008,0.690787,0.433837,0.116532,0.992407,0.016669
2009,0.060025,0.444211,0.406957,0.579171,0.801691


#### B6c) Rename index 

In [57]:
#We can use rename() method to rename the columns and row indexes aswell.
print("Rename index 11 to 2011:")

df = df.rename(index={11:2011})

df

Rename index 11 to 2011:


Unnamed: 0,India,USA,China,Japan,Italy
2000,0.7204,0.804348,0.990995,0.371329,0.339612
2001,0.258122,0.354631,0.425806,0.225605,0.821841
2002,0.034187,0.582727,0.11132,0.747159,0.44848
2003,0.791072,0.208659,0.942088,0.854797,0.258203
2004,0.256141,0.731758,0.882233,0.725983,0.274646
2005,0.703782,0.359889,0.542465,0.002443,0.9812
2006,0.354151,0.024519,0.446111,0.127251,0.017002
2007,0.484173,0.534549,0.386819,0.42577,0.849953
2008,0.690787,0.433837,0.116532,0.992407,0.016669
2009,0.060025,0.444211,0.406957,0.579171,0.801691


#### B6d) Delete rows

In [58]:
print("Example to delete row with 2011 as index name")

df = df.drop(2011)
df

Example to delete row with 2011 as index name


Unnamed: 0,India,USA,China,Japan,Italy
2000,0.7204,0.804348,0.990995,0.371329,0.339612
2001,0.258122,0.354631,0.425806,0.225605,0.821841
2002,0.034187,0.582727,0.11132,0.747159,0.44848
2003,0.791072,0.208659,0.942088,0.854797,0.258203
2004,0.256141,0.731758,0.882233,0.725983,0.274646
2005,0.703782,0.359889,0.542465,0.002443,0.9812
2006,0.354151,0.024519,0.446111,0.127251,0.017002
2007,0.484173,0.534549,0.386819,0.42577,0.849953
2008,0.690787,0.433837,0.116532,0.992407,0.016669
2009,0.060025,0.444211,0.406957,0.579171,0.801691


### B7) Conditional selection

In [63]:
#in conditional selection, we are giving passing condition to dataframe for selecting records
print("conditional selection Example1: df>0.3  ")

df>0.3

conditional selection Example1: df>0.3  


Unnamed: 0,India,USA,China,Japan,Italy
2000,True,True,True,True,True
2001,False,True,True,False,True
2002,False,True,False,True,True
2003,True,False,True,True,False
2004,False,True,True,True,False
2005,True,True,True,False,True
2006,True,False,True,False,False
2007,True,True,True,True,True
2008,True,True,False,True,False
2009,False,True,True,True,True


In [65]:
print("conditional selection Example2: df>0.3 apply this boolean to df. if True then value, else NaN.")

df[df>0.3]

conditional selection Example2: df>0.3 apply this boolean to df. if True then value, else NaN.


Unnamed: 0,India,USA,China,Japan,Italy
2000,0.7204,0.804348,0.990995,0.371329,0.339612
2001,,0.354631,0.425806,,0.821841
2002,,0.582727,,0.747159,0.44848
2003,0.791072,,0.942088,0.854797,
2004,,0.731758,0.882233,0.725983,
2005,0.703782,0.359889,0.542465,,0.9812
2006,0.354151,,0.446111,,
2007,0.484173,0.534549,0.386819,0.42577,0.849953
2008,0.690787,0.433837,,0.992407,
2009,,0.444211,0.406957,0.579171,0.801691


In [66]:
print("Ex3: In india if value > 0.3 then print all columns")
df[df['India']>0.3]

Ex3: In india if value > 0.3 then print all columns


Unnamed: 0,India,USA,China,Japan,Italy
2000,0.7204,0.804348,0.990995,0.371329,0.339612
2003,0.791072,0.208659,0.942088,0.854797,0.258203
2005,0.703782,0.359889,0.542465,0.002443,0.9812
2006,0.354151,0.024519,0.446111,0.127251,0.017002
2007,0.484173,0.534549,0.386819,0.42577,0.849953
2008,0.690787,0.433837,0.116532,0.992407,0.016669
2010,0.886171,0.72547,0.377946,0.314606,0.278101


In [68]:
print("Ex3: In india if value > 0.3 then print only India, USA columns")
df[df['India']>0.3][['India', 'USA']]

Ex3: In india if value > 0.3 then print only India, USA columns


Unnamed: 0,India,USA
2000,0.7204,0.804348
2003,0.791072,0.208659
2005,0.703782,0.359889
2006,0.354151,0.024519
2007,0.484173,0.534549
2008,0.690787,0.433837
2010,0.886171,0.72547


### B8) More about Index operations

In [78]:
print("Dataframe looks like below.")
df = pd.DataFrame(np.array(np.random.rand(10,5)), columns = "A B C D E".split(), index = np.arange(2000,2010))
df

Dataframe looks like below.


Unnamed: 0,A,B,C,D,E
2000,0.705286,0.935592,0.982698,0.232227,0.471225
2001,0.690445,0.703925,0.417257,0.4702,0.917374
2002,0.347052,0.259886,0.744818,0.710066,0.337093
2003,0.026948,0.453792,0.144959,0.358762,0.806018
2004,0.683328,0.761666,0.347978,0.284357,0.262764
2005,0.765109,0.523028,0.233466,0.041141,0.678384
2006,0.235347,0.952441,0.558369,0.452713,0.598167
2007,0.763978,0.189072,0.56697,0.520119,0.336954
2008,0.525414,0.706064,0.6282,0.31291,0.856458
2009,0.00725,0.182356,0.322726,0.478833,0.255956


#### B8a) Reset_Index()

In [82]:
print("Reset Index of DF to 0,1,2,etc,...")
df.reset_index()


Reset Index of DF to 0,1,2,etc,...


Unnamed: 0,index,A,B,C,D,E
0,2000,0.705286,0.935592,0.982698,0.232227,0.471225
1,2001,0.690445,0.703925,0.417257,0.4702,0.917374
2,2002,0.347052,0.259886,0.744818,0.710066,0.337093
3,2003,0.026948,0.453792,0.144959,0.358762,0.806018
4,2004,0.683328,0.761666,0.347978,0.284357,0.262764
5,2005,0.765109,0.523028,0.233466,0.041141,0.678384
6,2006,0.235347,0.952441,0.558369,0.452713,0.598167
7,2007,0.763978,0.189072,0.56697,0.520119,0.336954
8,2008,0.525414,0.706064,0.6282,0.31291,0.856458
9,2009,0.00725,0.182356,0.322726,0.478833,0.255956


### B8b) Set_index()

In [90]:
print("Set_index() Example.")
l1 = [np.arange(2000,2010)]

l1.
#df.set_index()



Set_index() Example.


[array([2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009])]

In [91]:
help(df.count)

Help on method count in module pandas.core.frame:

count(axis=0, level=None, numeric_only=False) method of pandas.core.frame.DataFrame instance
    Count non-NA cells for each column or row.
    
    The values `None`, `NaN`, `NaT`, and optionally `numpy.inf` (depending
    on `pandas.options.mode.use_inf_as_na`) are considered NA.
    
    Parameters
    ----------
    axis : {0 or 'index', 1 or 'columns'}, default 0
        If 0 or 'index' counts are generated for each column.
        If 1 or 'columns' counts are generated for each **row**.
    level : int or str, optional
        If the axis is a `MultiIndex` (hierarchical), count along a
        particular `level`, collapsing into a `DataFrame`.
        A `str` specifies the level name.
    numeric_only : boolean, default False
        Include only `float`, `int` or `boolean` data.
    
    Returns
    -------
    Series or DataFrame
        For each column/row the number of non-NA/null entries.
        If `level` is specified retu