# Introduction to Pandas

### Importing pandas module

In [None]:
import pandas as pd

<br>

## Pandas Series

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index.

#### <ins>**Example 1**</ins>

pd.Series(data, index=index)

Here, **data** can be many different things:

* a Python dict

* an ndarray

* a scalar value (like 5)

The passed index is a list of axis labels.

In [5]:
s = pd.Series([1, 3, 5, 5, 6, 8])

In [6]:
s

0    1
1    3
2    5
3    5
4    6
5    8
dtype: int64

In [7]:
s.values

array([1, 3, 5, 5, 6, 8])

In [8]:
s.index

RangeIndex(start=0, stop=6, step=1)

In [9]:
s[0]

1

In [11]:
s[0:2]

0    1
1    3
dtype: int64

<br>

#### <ins>**Example 2**</ins>

In [13]:
pd.Series([1, 3, 5, 6, 8], index=['a', 'b', 'c', 'd', 'e'])

a    1
b    3
c    5
d    6
e    8
dtype: int64

<br>

#### <ins>**Example 3**</ins>

In [14]:
d = {'b': 1, 'a': 0, 'c': 2}
pd.Series(d)

b    1
a    0
c    2
dtype: int64

<br>

#### <ins>**Example 4**</ins>

In [None]:
d = {'a': 0., 'b': 1., 'c': 2.}
pd.Series(d)

<br>

## Pandas Dataframe

DataFrame is a 2-dimensional labeled data structure with columns of potentially different types.

DataFrame accepts many different kinds of input:

* Dict of 1D ndarrays, lists, dicts, or Series

* 2-D numpy.ndarray

* Structured or record ndarray

* A Series

* Another DataFrame

#### <ins>**Example 1**</ins>

In [16]:
d = {'one': pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
     'two': pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}

pd.DataFrame(d)

Unnamed: 0,one,two
a,1.0,1.0
b,2.0,2.0
c,3.0,3.0
d,,4.0


#### <ins>**Example 2**</ins>

In [33]:
d1={'a':[1,2], 'b':[3,67], 'c':[12,0]}

df2=pd.DataFrame(d1, index=['x', 'y'])
df2

Unnamed: 0,a,b,c
x,1,3,12
y,2,67,0


In [34]:
df2['a']

x    1
y    2
Name: a, dtype: int64

<br>

#### <ins>**Example 3**</ins>

In [35]:
cars = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4'],
        'Price': [22000,25000,27000,35000]
        }

df3 = pd.DataFrame(cars, columns = ['Brand', 'Price'])
df3

Unnamed: 0,Brand,Price
0,Honda Civic,22000
1,Toyota Corolla,25000
2,Ford Focus,27000
3,Audi A4,35000


In [47]:
cars_df=pd.DataFrame(cars, columns = ['Brand', 'Price'], index=['x','y','z','t'])
cars_df

Unnamed: 0,Brand,Price
x,Honda Civic,22000
y,Toyota Corolla,25000
z,Ford Focus,27000
t,Audi A4,35000


<br>

### loc & iloc

**loc** is label-based, which means that you have to specify rows and columns based on their row and column labels. 
<br>
**iloc** is integer index based, so you have to specify rows and columns by their integer index like you did in the previous exercise.

In [48]:
cars_df.iloc[0,0]

'Honda Civic'

In [49]:
cars_df.iloc[0,1]

22000

In [50]:
cars_df.loc['x']

Brand    Honda Civic
Price          22000
Name: x, dtype: object

In [51]:
cars_df.loc['x', 'Price']

22000

<br>

## Importing csv file

In [85]:
tips=pd.read_csv("datasets/tips.csv")
tips

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.50,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.00,Female,Yes,Sat,Dinner,2
241,22.67,2.00,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2


In [52]:
tips.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [53]:
tips.tail()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.0,Female,Yes,Sat,Dinner,2
241,22.67,2.0,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2
243,18.78,3.0,Female,No,Thur,Dinner,2


In [56]:
tips.columns

Index(['total_bill', 'tip', 'sex', 'smoker', 'day', 'time', 'size'], dtype='object')

In [57]:
tips.index

RangeIndex(start=0, stop=244, step=1)

In [59]:
tips.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   total_bill  244 non-null    float64
 1   tip         244 non-null    float64
 2   sex         244 non-null    object 
 3   smoker      244 non-null    object 
 4   day         244 non-null    object 
 5   time        244 non-null    object 
 6   size        244 non-null    int64  
dtypes: float64(2), int64(1), object(4)
memory usage: 13.5+ KB


In [60]:
tips.shape

(244, 7)

In [61]:
tips

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.50,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.00,Female,Yes,Sat,Dinner,2
241,22.67,2.00,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2


In [63]:
tips.loc[:3,:"tip"]

Unnamed: 0,total_bill,tip
0,16.99,1.01
1,10.34,1.66
2,21.01,3.5
3,23.68,3.31


In [65]:
tips[["tip"]]

Unnamed: 0,tip
0,1.01
1,1.66
2,3.50
3,3.31
4,3.61
...,...
239,5.92
240,2.00
241,2.00
242,1.75


In [66]:
tips["tip"]

0      1.01
1      1.66
2      3.50
3      3.31
4      3.61
       ... 
239    5.92
240    2.00
241    2.00
242    1.75
243    3.00
Name: tip, Length: 244, dtype: float64

In [67]:
tips[tips['time']=='Dinner']

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.50,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.00,Female,Yes,Sat,Dinner,2
241,22.67,2.00,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2


In [76]:
tips.loc[tips.total_bill >= 10, ['sex']]

Unnamed: 0,sex
0,Female
1,Male
2,Male
3,Male
4,Female
...,...
239,Male
240,Female
241,Male
242,Male


In [80]:
tips[tips['total_bill'] >= 10]

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.50,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.00,Female,Yes,Sat,Dinner,2
241,22.67,2.00,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2


In [82]:
tips[0:3]

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3


In [83]:
tips.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


In [84]:
tips.corr()

Unnamed: 0,total_bill,tip,size
total_bill,1.0,0.675734,0.598315
tip,0.675734,1.0,0.489299
size,0.598315,0.489299,1.0


<br>

<br>

<br>