## PANDAS: INTRODUCTION

### PANDAS: Data Manipulation

> It is often said that 80% of data analysis is spent on data cleaning and preparing data. To get a handle on the problem, this section focuses on a small, but important aspect of data manipulation and cleaning with Pandas.

### PANDAS: Data Structures

*There are two different data structures are there in Pandas -*

* **Series -** It is a one-dimensional labeled array capable of holding any data type (e.g. integer, string, floating point number, Python objects etc.). The axis are collectively referred to as the index.

* **Data Frame -** It is a two-dimensional labeled data structure with columns of potentially different types. We can think of it like a spreadsheet or SQL table, or a Series of objects.

### PANDAS: Series Data Structure

**Definition of Series data structure -**

pandas.core.series.Series(data, index, dtype, copy)

**data:** data takes various forms like ndarray, list, constants etc.<br>
**index:** it is unique and hashable for easy identification<br>
**dtype:** it is for data type<br>
**copy:** only affects when Series is getting defined from one dimensional ndarray<br>

In [1]:
# importing required modules
import pandas as pd
import numpy as np

In [2]:
# creating empty Series
s = pd.Series()
print (s, len(s), type(s), id(s))

Series([], dtype: float64) 0 <class 'pandas.core.series.Series'> 2465717728592


  s = pd.Series()


In [3]:
# creating Series from ndarray
nddata = np.array(['aaa', 'bbb', 'ccc', 'ddd'])
print (nddata, type(nddata))
s = pd.Series(data = nddata)
print (s, type(s))

['aaa' 'bbb' 'ccc' 'ddd'] <class 'numpy.ndarray'>
0    aaa
1    bbb
2    ccc
3    ddd
dtype: object <class 'pandas.core.series.Series'>


In [7]:
nddata = np.array([100, 200, 400, 500, 350])
print (nddata, type(nddata))
s = pd.Series(data = nddata, copy = False)
print (s, type(s))
nddata[2] = 99999
print (nddata, type(nddata))
print (s, type(s))

[100 200 400 500 350] <class 'numpy.ndarray'>
0    100
1    200
2    400
3    500
4    350
dtype: int32 <class 'pandas.core.series.Series'>
[  100   200 99999   500   350] <class 'numpy.ndarray'>
0      100
1      200
2    99999
3      500
4      350
dtype: int32 <class 'pandas.core.series.Series'>


In [8]:
nddata = np.array([100, 200, 400, 500, 350])
print (nddata, type(nddata))
s = pd.Series(data = nddata, copy = True)
print (s, type(s))
nddata[2] = 99999
print (nddata, type(nddata))
print (s, type(s))

[100 200 400 500 350] <class 'numpy.ndarray'>
0    100
1    200
2    400
3    500
4    350
dtype: int32 <class 'pandas.core.series.Series'>
[  100   200 99999   500   350] <class 'numpy.ndarray'>
0    100
1    200
2    400
3    500
4    350
dtype: int32 <class 'pandas.core.series.Series'>


In [12]:
nddata = np.array(['aaa', 'bbb', 'ccc', 'ddd', 'eee', 'fff', 'ggg'])
print (nddata, type(nddata))
s = pd.Series(data = nddata, index = [100, 101, 130, 120, 101, 303, 404])
print (s, type(s))
print (s[100])
print (s[101])
print (s[404])

['aaa' 'bbb' 'ccc' 'ddd' 'eee' 'fff' 'ggg'] <class 'numpy.ndarray'>
100    aaa
101    bbb
130    ccc
120    ddd
101    eee
303    fff
404    ggg
dtype: object <class 'pandas.core.series.Series'>
aaa
101    bbb
101    eee
dtype: object
ggg


In [14]:
nddata = np.array(['aaa', 'bbb', 'ccc', 'ddd', 'eee', 'fff', 'ggg'])
print (nddata, type(nddata))
s = pd.Series(data = nddata, index = ['a', 'b', 'd', 'c', 'h', 'e', 'y'])
print (s, type(s))
print (s[0], s['a'])
print (s[1], s['b'])
print (s[4], s['h'])

['aaa' 'bbb' 'ccc' 'ddd' 'eee' 'fff' 'ggg'] <class 'numpy.ndarray'>
a    aaa
b    bbb
d    ccc
c    ddd
h    eee
e    fff
y    ggg
dtype: object <class 'pandas.core.series.Series'>
aaa aaa
bbb bbb
eee eee


In [15]:
# creating Series from dictionary
dictdata = {'apple':100, 'banana':220, 'orange':450, 'pineapple':320}
print (dictdata, type(dictdata))
s = pd.Series(data = dictdata)
print (s, type(s))

{'apple': 100, 'banana': 220, 'orange': 450, 'pineapple': 320} <class 'dict'>
apple        100
banana       220
orange       450
pineapple    320
dtype: int64 <class 'pandas.core.series.Series'>


In [18]:
dictdata = {'apple':100, 'banana':220, 'orange':450, 'pineapple':320}
print (dictdata, type(dictdata))
s = pd.Series(data = dictdata, index = ['apple', 'orange', 'orange', 'apple', 'orange', 'pineapple', 'banana'])
print (s, type(s))
print (s['banana'])
print (s['orange'])
print (s['apple'])

{'apple': 100, 'banana': 220, 'orange': 450, 'pineapple': 320} <class 'dict'>
apple        100
orange       450
orange       450
apple        100
orange       450
pineapple    320
banana       220
dtype: int64 <class 'pandas.core.series.Series'>
220
orange    450
orange    450
orange    450
dtype: int64
apple    100
apple    100
dtype: int64


In [19]:
# creating Series from scaler
s = pd.Series(data = 5, index = [0, 1, 2, 3, 4, 5, 6])
print (s, type(s))

0    5
1    5
2    5
3    5
4    5
5    5
6    5
dtype: int64 <class 'pandas.core.series.Series'>


In [25]:
# creating Series from list
listdata = ['Monday', 'Friday', 'Saturday', 'Tuesday']
print (listdata, type(listdata))
s = pd.Series(data = listdata)
print (s, type(s))
s = pd.Series(data = listdata, index = ['1st', '2nd', '3rd', '4th'])
print (s, type(s))
s = pd.Series(data = listdata, index = ['3rd', '1st', '4th', '2nd'])
print (s, type(s))
print (s.sort_values())
print (s.sort_index())

['Monday', 'Friday', 'Saturday', 'Tuesday'] <class 'list'>
0      Monday
1      Friday
2    Saturday
3     Tuesday
dtype: object <class 'pandas.core.series.Series'>
1st      Monday
2nd      Friday
3rd    Saturday
4th     Tuesday
dtype: object <class 'pandas.core.series.Series'>
3rd      Monday
1st      Friday
4th    Saturday
2nd     Tuesday
dtype: object <class 'pandas.core.series.Series'>
1st      Friday
3rd      Monday
4th    Saturday
2nd     Tuesday
dtype: object
1st      Friday
2nd     Tuesday
3rd      Monday
4th    Saturday
dtype: object


In [32]:
listdata = ['Monday', 'Friday', 'Saturday', 'Tuesday', 'Wednesday', 'Sunday']
print (listdata, type(listdata))
s = pd.Series(data = listdata)
print (s, type(s))
print (s[2])
print (s[:3])
print (s[3:])
print (s[2:4])

['Monday', 'Friday', 'Saturday', 'Tuesday', 'Wednesday', 'Sunday'] <class 'list'>
0       Monday
1       Friday
2     Saturday
3      Tuesday
4    Wednesday
5       Sunday
dtype: object <class 'pandas.core.series.Series'>
Saturday
0      Monday
1      Friday
2    Saturday
dtype: object
3      Tuesday
4    Wednesday
5       Sunday
dtype: object
2    Saturday
3     Tuesday
dtype: object


### PANDAS: Data Frame Structure