### Table of contents

<div class="alert alert-block alert-info" style="margin-top: 20px">
    <ol>
        <li><a href="#ref1">Creating Dataframe</a></li>
        <li><a href="#ref2">Creating Dataframe from CSV</a></li>
        <li><a href="#ref3">Creating Dataframe from Excel</a></li>
        <li><a href="#ref4">Display Data: Head and Tail</a></li>
        <li><a href="#ref5">Show Columns</a></li>
        <li><a href="#ref6">Show Data Types of columns</a></li>
        <li><a href="#ref7">Dataframe Indexing</a></li>
        <li><a href="#ref8">Converting a series to a dataframe</a></li>
    </ol>
</div>
<br>


<a id="ref1"></a>

## Creating Dataframe

In [1]:
import pandas as pd
import numpy as np

#### Creating dataframe from a list

In [2]:
# Calling DataFrame constructor
letters_df = pd.DataFrame()
print(letters_df)
 
# list of strings
lst_letters = ['a', 'b', 'c', 'd',
            'e', 'f', 'g']
 
# Calling DataFrame constructor on list
letters_df = pd.DataFrame(lst_letters)
letters_df

Empty DataFrame
Columns: []
Index: []


Unnamed: 0,0
0,a
1,b
2,c
3,d
4,e
5,f
6,g


#### Creating dataframe from a dictionary

In [3]:
states_data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Nevada'],
 'year': [2000, 2001, 2002, 2001, 2002, 2003],
 'pop': [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]}
states_df = pd.DataFrame(states_data)
states_df

Unnamed: 0,state,year,pop
0,Ohio,2000,1.5
1,Ohio,2001,1.7
2,Ohio,2002,3.6
3,Nevada,2001,2.4
4,Nevada,2002,2.9
5,Nevada,2003,3.2


#### Creating dataframe from series

![df.PNG](attachment:df.PNG)

In [4]:
# Creating two Series: author and article:
author_series = pd.Series(['Jitender', 'Purnima','Arpit', 'Jyoti'])
article_series = pd.Series([210, 211, 114, 178])

# Creating a dictionary by passing Series objects as values
author_dict = {'Author': author_series,
         'Article': article_series}

# Creating DataFrame by passing Dictionary
author_df = pd.DataFrame(author_dict)
 
# Printing author Dataframe
author_df

Unnamed: 0,Author,Article
0,Jitender,210
1,Purnima,211
2,Arpit,114
3,Jyoti,178


<a id="ref2"></a>

## Creating Dataframe from CSV

We can create a dataframe from the CSV files using the read_csv() function.

Note: The csv data can be downloaded from: https://datahub.io/machine-learning/iris.

In [1]:
import pandas as pd
 
# Reading the CSV file
iris_csv = pd.read_csv("../data/iris_csv.csv")
iris_csv

Unnamed: 0,sepallength,sepalwidth,petallength,petalwidth,class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,Iris-virginica
146,6.3,2.5,5.0,1.9,Iris-virginica
147,6.5,3.0,5.2,2.0,Iris-virginica
148,6.2,3.4,5.4,2.3,Iris-virginica


<a id="ref3"></a>

## Creating Dataframe from Excel

In [3]:
# Reading the Excel file
finance_excel = pd.read_excel('../data/Financial_Sample.xlsx')
finance_excel

Unnamed: 0,Segment,Country,Product,Discount Band,Units Sold,Manufacturing Price,Sale Price,Gross Sales,Discounts,Sales,COGS,Profit,Date,Month Number,Month Name,Year
0,Government,Canada,Carretera,,1618.5,3,20,32370.0,0.00,32370.00,16185.0,16185.00,2014-01-01,1,January,2014
1,Government,Germany,Carretera,,1321.0,3,20,26420.0,0.00,26420.00,13210.0,13210.00,2014-01-01,1,January,2014
2,Midmarket,France,Carretera,,2178.0,3,15,32670.0,0.00,32670.00,21780.0,10890.00,2014-06-01,6,June,2014
3,Midmarket,Germany,Carretera,,888.0,3,15,13320.0,0.00,13320.00,8880.0,4440.00,2014-06-01,6,June,2014
4,Midmarket,Mexico,Carretera,,2470.0,3,15,37050.0,0.00,37050.00,24700.0,12350.00,2014-06-01,6,June,2014
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
695,Small Business,France,Amarilla,High,2475.0,260,300,742500.0,111375.00,631125.00,618750.0,12375.00,2014-03-01,3,March,2014
696,Small Business,Mexico,Amarilla,High,546.0,260,300,163800.0,24570.00,139230.00,136500.0,2730.00,2014-10-01,10,October,2014
697,Government,Mexico,Montana,High,1368.0,5,7,9576.0,1436.40,8139.60,6840.0,1299.60,2014-02-01,2,February,2014
698,Government,Canada,Paseo,High,723.0,10,7,5061.0,759.15,4301.85,3615.0,686.85,2014-04-01,4,April,2014


<a id="ref4"></a>

## Display Data: Head and Tail

In [6]:
# Printing top 5 rows
iris_csv.head()

Unnamed: 0,sepallength,sepalwidth,petallength,petalwidth,class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [7]:
# Printing last 5 rows
iris_csv.tail()

Unnamed: 0,sepallength,sepalwidth,petallength,petalwidth,class
145,6.7,3.0,5.2,2.3,Iris-virginica
146,6.3,2.5,5.0,1.9,Iris-virginica
147,6.5,3.0,5.2,2.0,Iris-virginica
148,6.2,3.4,5.4,2.3,Iris-virginica
149,5.9,3.0,5.1,1.8,Iris-virginica


In [9]:
finance_excel.head()

Unnamed: 0,Segment,Country,Product,Discount Band,Units Sold,Manufacturing Price,Sale Price,Gross Sales,Discounts,Sales,COGS,Profit,Date,Month Number,Month Name,Year
0,Government,Canada,Carretera,,1618.5,3,20,32370.0,0.0,32370.0,16185.0,16185.0,2014-01-01,1,January,2014
1,Government,Germany,Carretera,,1321.0,3,20,26420.0,0.0,26420.0,13210.0,13210.0,2014-01-01,1,January,2014
2,Midmarket,France,Carretera,,2178.0,3,15,32670.0,0.0,32670.0,21780.0,10890.0,2014-06-01,6,June,2014
3,Midmarket,Germany,Carretera,,888.0,3,15,13320.0,0.0,13320.0,8880.0,4440.0,2014-06-01,6,June,2014
4,Midmarket,Mexico,Carretera,,2470.0,3,15,37050.0,0.0,37050.0,24700.0,12350.0,2014-06-01,6,June,2014


In [10]:
# Printing top 10 rows
iris_csv.head(10)

Unnamed: 0,sepallength,sepalwidth,petallength,petalwidth,class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
5,5.4,3.9,1.7,0.4,Iris-setosa
6,4.6,3.4,1.4,0.3,Iris-setosa
7,5.0,3.4,1.5,0.2,Iris-setosa
8,4.4,2.9,1.4,0.2,Iris-setosa
9,4.9,3.1,1.5,0.1,Iris-setosa


In [11]:
# Printing last 20 rows
iris_csv.tail(20)

Unnamed: 0,sepallength,sepalwidth,petallength,petalwidth,class
130,7.4,2.8,6.1,1.9,Iris-virginica
131,7.9,3.8,6.4,2.0,Iris-virginica
132,6.4,2.8,5.6,2.2,Iris-virginica
133,6.3,2.8,5.1,1.5,Iris-virginica
134,6.1,2.6,5.6,1.4,Iris-virginica
135,7.7,3.0,6.1,2.3,Iris-virginica
136,6.3,3.4,5.6,2.4,Iris-virginica
137,6.4,3.1,5.5,1.8,Iris-virginica
138,6.0,3.0,4.8,1.8,Iris-virginica
139,6.9,3.1,5.4,2.1,Iris-virginica


<a id="ref5"></a>

## Show Columns

In [10]:
# The columns property returns the label of each column in the DataFrame
iris_csv.columns

Index(['sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'class'], dtype='object')

<a id="ref6"></a>

## Show Data Types of Columns

In [13]:
iris_csv.dtypes

sepallength    float64
sepalwidth     float64
petallength    float64
petalwidth     float64
class           object
dtype: object

In [14]:
iris_csv['sepalwidth'].astype(int)

0      3
1      3
2      3
3      3
4      3
      ..
145    3
146    2
147    3
148    3
149    3
Name: sepalwidth, Length: 150, dtype: int32

<a id="ref7"></a>

## Dataframe Indexing

![loc.PNG](attachment:loc.PNG)

In [15]:
np.arange(4).reshape(1, 4)

array([[0, 1, 2, 3]])

In [17]:
states_new = pd.DataFrame(np.arange(16).reshape((4, 4)),
                    index=['Ohio', 'Colorado', 'Utah', 'New York'],
                    columns=['one', 'two', 'three', 'four'])

In [18]:
states_new

Unnamed: 0,one,two,three,four
Ohio,0,1,2,3
Colorado,4,5,6,7
Utah,8,9,10,11
New York,12,13,14,15


### Indexing operator [ ] and [ [ ] ]

##### SIngle square brackets [ ] is for series

#### if we need to select all data from one or multiple columns of a pandas dataframe: 

In [18]:
# This results in a pandas Series:
type(states_new['one']) # note: only a single square bracket

pandas.core.series.Series

In [23]:
states_new['one']

Ohio         0
Colorado     4
Utah         8
New York    12
Name: one, dtype: int32

In [19]:
type(states_new[['one']])

pandas.core.frame.DataFrame

In [19]:
states_new[['one']]

Unnamed: 0,one
Ohio,0
Colorado,4
Utah,8
New York,12


In [20]:
# As pandas series is nothing but a column in dataframe,
# hence cannot include more than 1 columns within []
states_new['one', 'three'] # ERROR

KeyError: ('one', 'three')

In [22]:
#         [column]  [row]
states_new['three']['Utah']

10

##### Double square brackets [ [ ] ] is for Dataframe

In [20]:
# This results in a pandas dataframe:
states_new[['one']] # note: double square brackets

Unnamed: 0,one
Ohio,0
Colorado,4
Utah,8
New York,12


In [21]:
# Note: As pandas dataframe can cannot as many columns as we want,
# hence can include more than 1 columns within [ [] ]
states_new[['one', 'three']]

Unnamed: 0,one,three
Ohio,0,2
Colorado,4,6
Utah,8,10
New York,12,14


In [22]:
# Columns can be in any order
states_new[['one', 'three', 'two']]

Unnamed: 0,one,three,two
Ohio,0,2,1
Colorado,4,6,5
Utah,8,10,9
New York,12,14,13


### Attribute Operator

In [23]:
# To select only one column of a dataframe, we can access it directly by its name as an attribute:
states_new.one # output will be a series

Ohio         0
Colorado     4
Utah         8
New York    12
Name: one, dtype: int32

In [24]:
# The piece of code above is equivalent to states_new['one']:
states_new['one'] # output will be a series

Ohio         0
Colorado     4
Utah         8
New York    12
Name: one, dtype: int32

### Selection with loc and iloc

#### .loc is used for label indexing

In [27]:
states_new.loc[:, :]

Unnamed: 0,one,two,three,four
Ohio,0,1,2,3
Colorado,4,5,6,7
Utah,8,9,10,11
New York,12,13,14,15


In [25]:
#             [row   , column]
states_new.loc['Ohio', 'one']

0

In [26]:
#             [row   , column]
states_new.loc[:'Utah', 'two']

Ohio        1
Colorado    5
Utah        9
Name: two, dtype: int32

In [28]:
# : -> one element
# , -> separate elements
# : -> specifying a range
states_new.loc['Colorado':, 'one':'three']

Unnamed: 0,one,two,three
Colorado,4,5,6
Utah,8,9,10
New York,12,13,14


In [31]:
#              row label ,   column label
# if a single row or column is provided, output will be a series
states_new.loc['Colorado', ['two', 'three']]

two      5
three    6
Name: Colorado, dtype: int32

#### .iloc is used for integer indexing

In [32]:
states_new.head()

Unnamed: 0,one,two,three,four
Ohio,0,1,2,3
Colorado,4,5,6,7
Utah,8,9,10,11
New York,12,13,14,15


In [33]:
# row position (data for Utah)
states_new.iloc[2]

one       8
two       9
three    10
four     11
Name: Utah, dtype: int32

In [34]:
states_new.head()

Unnamed: 0,one,two,three,four
Ohio,0,1,2,3
Colorado,4,5,6,7
Utah,8,9,10,11
New York,12,13,14,15


In [35]:
#   row position , column position
states_new.iloc[2, [3, 0, 1]]

four    11
one      8
two      9
Name: Utah, dtype: int32

In [36]:
states_new.head()

Unnamed: 0,one,two,three,four
Ohio,0,1,2,3
Colorado,4,5,6,7
Utah,8,9,10,11
New York,12,13,14,15


In [37]:
#               row position, column position
states_new.iloc[   [1, 2]   ,   [3, 0, 1]    ]

Unnamed: 0,four,one,two
Colorado,7,4,5
Utah,11,8,9


<a id="ref8"></a>

## Converting a single series to a dataframe

In [2]:
items_series = pd.Series(['Computer', 'Printer', 'Tablet', 'Desk', 'Chair'])

In [3]:
print(items_series)

0    Computer
1     Printer
2      Tablet
3        Desk
4       Chair
dtype: object


In [4]:
print(type(items_series))

<class 'pandas.core.series.Series'>


In [5]:
items_df = items_series.to_frame(name="Products")

In [6]:
items_df.head()

Unnamed: 0,Products
0,Computer
1,Printer
2,Tablet
3,Desk
4,Chair


In [7]:
print(type(items_df))

<class 'pandas.core.frame.DataFrame'>


### Task 1:

Select the first 5 rows in finance_excel dataframe.

Double-click <b>here</b> for the solution.

<!-- Soltuion is below:

finance_excel.head(10)

finance_excel.tail(10)

finance_excel.columns


finance_excel.loc[:4]
# or:
# finance_excel.loc[0:4]
# or:
# finance_excel.loc[[0, 1, 2, 3, 4]]
# or:
# finance_excel.iloc[0:5]

-->

### Task 2:

Select the last 10 rows in finance_excel dataframe.

Double-click <b>here</b> for the solution.

<!-- Soltuion is below:

finance_excel.iloc[-10:]

-->

### Task 3:

Select the "Units Sold", "Gross Sales", "Sales" and "Profit" columns in finance_excel dataframe.

Double-click <b>here</b> for the solution.

<!-- Soltuion is below:

finance_excel.loc[:, ["Units Sold", "Gross Sales", " Sales", "Profit"] ]

# or:
# finance_excel.iloc[:, [4, 7, 9, 11]]

-->

### Task 4:

Select the "Country", "Product" and "Discound Band" columns and rows between 690-695 in finance_excel dataframe.

Double-click <b>here</b> for the solution.

<!-- Soltuion is below:

finance_excel.loc[ 690:695 , ["Country", "Product", "Discount Band"] ]

# or:
#finance_excel.iloc[ 690:696 , 1:4]

-->