# **Pandas**

In [0]:
import pandas as pd

**Pandas data structures**



---

There are two types of data structures in pandas: Series and DataFrames





---

**Series:**

A pandas Series is a one dimensional data structure (“a one dimensional ndarray”) that can store values — and for every value it holds a unique index, too.



---

[Intro to Data Structures](https://pandas.pydata.org/pandas-docs/stable/getting_started/dsintro.html)

[Python Pandas - Series](https://www.tutorialspoint.com/python_pandas/python_pandas_series.htm)


**Example**

In [2]:
s = pd.Series()
print(s)
print(type(s))

Series([], dtype: float64)
<class 'pandas.core.series.Series'>


**Example**

In [3]:
# Example Create a series from array
 
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d','e','f'])
s = pd.Series(data)
print(s)

0    a
1    b
2    c
3    d
4    e
5    f
dtype: object


**Example**

In [4]:
# Example Create a series from array with specified index
 
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d','e','f'])
s = pd.Series(data,index=[1000,1001,1002,1003,1004,1005])
print(s)

1000    a
1001    b
1002    c
1003    d
1004    e
1005    f
dtype: object


***Create a Series from dict***


**Example**

In [5]:
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data)
print(s)

a    0.0
b    1.0
c    2.0
dtype: float64


**Example**

In [6]:
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data,index=['b','c','d','a'])
print(s)

b    1.0
c    2.0
d    NaN
a    0.0
dtype: float64


**Accessing Data from Series with Position**

In [7]:
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve the first three element
print(s[0])

print(s[1])

print(s[3])

print('\n')



#retrieve the first three element once
print (s[:3])

1
2
4


a    1
b    2
c    3
dtype: int64


**Retrieve a single element using index label value.**

In [8]:
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve a single element
print(s['a'])

1



**DataFrame**
---




A pandas DataFrame is a two (or more) dimensional data structure – basically a table with rows and columns. The columns have names and the rows have indexes.

![Data Frame Structure](https://www.tutorialspoint.com/python_pandas/images/structure_table.jpg)



---

A pandas DataFrame can be created using various inputs like −

1. Lists

2. dict

3. Series

4. Numpy ndarrays

5. Another DataFrame

**Create a DataFrame from Lists**

In [9]:
import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data)

df

Unnamed: 0,0
0,1
1,2
2,3
3,4
4,5


**Example 2**

In [10]:
data = [['ram',10],['Shyam',12],['sita',13]]
df = pd.DataFrame(data,columns=['Name','Age'])

df

Unnamed: 0,Name,Age
0,ram,10
1,Shyam,12
2,sita,13


**Example 3**

In [11]:
import pandas as pd
data = [['Ram',10],['Shyam',12],['Sita',13]]
df = pd.DataFrame(data,columns=['Name','Age'],dtype=float)
df

Unnamed: 0,Name,Age
0,Ram,10.0
1,Shyam,12.0
2,Sita,13.0


**Create a DataFrame from Dict of ndarrays / Lists**

In [12]:
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data)


df

Unnamed: 0,Name,Age
0,Tom,28
1,Jack,34
2,Steve,29
3,Ricky,42


In [13]:
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4'])
df

Unnamed: 0,Name,Age
rank1,Tom,28
rank2,Jack,34
rank3,Steve,29
rank4,Ricky,42


**Create a DataFrame from List of Dicts**

In [14]:
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data)
df

Unnamed: 0,a,b,c
0,1,2,
1,5,10,20.0


In [15]:
data = {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd']}
df = pd.DataFrame.from_dict(data)
df

Unnamed: 0,col_1,col_2
0,3,a
1,2,b
2,1,c
3,0,d


**Select specific columns of your dataframe**

In [16]:
data = {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd']}
df = pd.DataFrame.from_dict(data)
df['col_1']

0    3
1    2
2    1
3    0
Name: col_1, dtype: int64

In [17]:
df['col_1'].values

array([3, 2, 1, 0])

In [18]:
df['col_2']

0    a
1    b
2    c
3    d
Name: col_2, dtype: object

In [19]:
df['col_2'].values

array(['a', 'b', 'c', 'd'], dtype=object)

**Column Deletion**

In [20]:
# Using the previous DataFrame, we will delete a column
# using del function
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']), 
   'three' : pd.Series([10,20,30], index=['a','b','c'])}

df = pd.DataFrame(d)
print ("Our dataframe is:\n",df)


# using del function
print ("\nDeleting the first column using DEL function:\n")
del df['one']

print(df)

# using pop function
print ("\nDeleting another column using POP function:\n")
df.pop('two')

print(df)

Our dataframe is:
    one  two  three
a  1.0    1   10.0
b  2.0    2   20.0
c  3.0    3   30.0
d  NaN    4    NaN

Deleting the first column using DEL function:

   two  three
a    1   10.0
b    2   20.0
c    3   30.0
d    4    NaN

Deleting another column using POP function:

   three
a   10.0
b   20.0
c   30.0
d    NaN


**Row Selection, Addition, and Deletion**

In [21]:
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
df

Unnamed: 0,one,two
a,1.0,1
b,2.0,2
c,3.0,3
d,,4


**Selection by Label**


Rows can be selected by passing row label to a loc function.

In [22]:
df.loc['a']

one    1.0
two    1.0
Name: a, dtype: float64

In [23]:
#df.loc[df.one <3] 

#or 

df.loc[df['one']<3]



Unnamed: 0,one,two
a,1.0,1
b,2.0,2


**Note :**




```
To Access Column of a  data frame you can either use df.column_name or  df['column_name']

In avobe Example , df.one or df['one'] both are valid.
here column name refers - one / two

However, You should keep in mind that df.column_name is invalid if column name has two or more words seperated by space
```



---

*Example*


In [24]:
d = {'one more' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
   'two more' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
df

Unnamed: 0,one more,two more
a,1.0,1
b,2.0,2
c,3.0,3
d,,4




```
In avobe example , 

df.one more  ----  is invalid, so you should use :

df['one more']
```



In [25]:
df['one more']

a    1.0
b    2.0
c    3.0
d    NaN
Name: one more, dtype: float64

In [26]:
df.one more ### raises error

SyntaxError: ignored

**Selection by integer location**

> Rows can be selected by passing integer location to an iloc function.





In [27]:
df.iloc[3]

one more    NaN
two more    4.0
Name: d, dtype: float64

**Addition of Rows**

In [0]:
df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])

In [29]:
df

Unnamed: 0,a,b
0,1,2
1,3,4


In [30]:
df2

Unnamed: 0,a,b
0,5,6
1,7,8


In [31]:
df = df.append(df2)

df

Unnamed: 0,a,b
0,1,2
1,3,4
0,5,6
1,7,8


**Deletion of Rows**

In [32]:
import pandas as pd

df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])
df

Unnamed: 0,a,b
0,1,2
1,3,4


In [33]:
df2

Unnamed: 0,a,b
0,5,6
1,7,8


In [0]:
df = df.append(df2)

In [35]:
df

Unnamed: 0,a,b
0,1,2
1,3,4
0,5,6
1,7,8


In [0]:
# Drop rows with label 0
df = df.drop(0)

In [37]:
df

Unnamed: 0,a,b
1,3,4
1,7,8


**Add Extra column in Dataframe**

In [0]:
df = pd.DataFrame({'col1': [22, 23],'col2': [0.5, 0.75]},index=['row1', 'row2'])

In [39]:
df

Unnamed: 0,col1,col2
row1,22,0.5
row2,23,0.75


In [0]:
df['col3'] = df['col1'] + df['col2']

In [41]:
df

Unnamed: 0,col1,col2,col3
row1,22,0.5,22.5
row2,23,0.75,23.75


In [0]:
df['col4']  = [1,2]

In [43]:
df

Unnamed: 0,col1,col2,col3,col4
row1,22,0.5,22.5,1
row2,23,0.75,23.75,2


In [0]:
df['col5']  = (111,222)

In [45]:
df

Unnamed: 0,col1,col2,col3,col4,col5
row1,22,0.5,22.5,1,111
row2,23,0.75,23.75,2,222
