<a href="https://colab.research.google.com/github/sajid-munawar/Pandas-a-versatile-and-high-performance-Python-library-for-data-manipulation-analysis-and-discover/blob/main/Chapter_1_A_Tour_of_pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**pandas—a versatile
and high-performance Python library for
data manipulation, analysis, and discovery**

In [1]:
# import numpy and pandas, and DataFrame / Series
import numpy as np
import pandas as pd
from pandas import DataFrame, Series

In [2]:
# set some pandas options
pd.set_option('display.notebook_repr_html',False)
pd.set_option('display.max_columns',10)
pd.set_option('display.max_rows',10)

In [3]:
# Some items for matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
# pd.options.display.mpl_style='default'
# pd.options.display.mpl_style = 'default'

**The pandas Series object**

In [4]:
#  Create a four item DataFrame
s=Series([1,2,3,4])
s

0    1
1    2
2    3
3    4
dtype: int64

In [5]:
#  Return a Series with the rows with labels 1, 3
s[[1,3]]

1    2
3    4
dtype: int64

It is important to note that the lookup here is not by zero-based
positions 1 and 3 like an array, but by the values in the index.

In [6]:
#create a series using an explicit index
s=Series([1,2,3,4], index=['a','b','c','d'])
s

a    1
b    2
c    3
d    4
dtype: int64

In [7]:
# look up items in the series having alphanumeric index 'a' and 'd'\
s[['a','d']]

a    1
d    4
dtype: int64

In [8]:
# It is still possible to refer to the elements of the Series object by their numerical position.
s[[1,3]]

b    2
d    4
dtype: int64

In [9]:
# getting only index
s.index

Index(['a', 'b', 'c', 'd'], dtype='object')

In [10]:
  # create a series who's index is a series of dates between the two specified dates
  dates=pd.date_range('2015-08-01','2015-08-06')
  dates

DatetimeIndex(['2015-08-01', '2015-08-02', '2015-08-03', '2015-08-04',
               '2015-08-05', '2015-08-06'],
              dtype='datetime64[ns]', freq='D')

In [11]:
temp1=Series([23,24,21,25,22,26],index=dates)

In [12]:
temp1

2015-08-01    23
2015-08-02    24
2015-08-03    21
2015-08-04    25
2015-08-05    22
2015-08-06    26
Freq: D, dtype: int64

In [13]:
# calculate the mean of the series
temp1.mean()

23.5

Two Series objects can be applied to each other with an arithmetic operation.
The following code calculates the difference in temperature between two Series .

In [14]:
# create a series with the same index
temp2=Series([22,27,24,21,25,20],index=dates)
temp2

2015-08-01    22
2015-08-02    27
2015-08-03    24
2015-08-04    21
2015-08-05    25
2015-08-06    20
Freq: D, dtype: int64

In [15]:
temp_diff=temp2-temp1
temp_diff

2015-08-01   -1
2015-08-02    3
2015-08-03    3
2015-08-04   -4
2015-08-05    3
2015-08-06   -6
Freq: D, dtype: int64

In [16]:
# look up a value by dates using index 
temp_diff['2015-08-03']

3

In [17]:
# also possible with an integer option
temp_diff[2]

3

**The pandas DataFrame object**

In [18]:
# Create a DataFrame from two series object
temps_dif=DataFrame({'Lahore':temp1,"karachi":temp2})
temps_dif

            Lahore  karachi
2015-08-01      23       22
2015-08-02      24       27
2015-08-03      21       24
2015-08-04      25       21
2015-08-05      22       25
2015-08-06      26       20

In [19]:
# get the column with the name Lahore
temps_dif['Lahore']

2015-08-01    23
2015-08-02    24
2015-08-03    21
2015-08-04    25
2015-08-05    22
2015-08-06    26
Freq: D, Name: Lahore, dtype: int64

In [20]:
temps_dif['karachi']

2015-08-01    22
2015-08-02    27
2015-08-03    24
2015-08-04    21
2015-08-05    25
2015-08-06    20
Freq: D, Name: karachi, dtype: int64

In [21]:
temps_dif[['karachi','Lahore']]

            karachi  Lahore
2015-08-01       22      23
2015-08-02       27      24
2015-08-03       24      21
2015-08-04       21      25
2015-08-05       25      22
2015-08-06       20      26

In [22]:
# retrive the Lahore column through the property syntax
temps_dif.Lahore

2015-08-01    23
2015-08-02    24
2015-08-03    21
2015-08-04    25
2015-08-05    22
2015-08-06    26
Freq: D, Name: Lahore, dtype: int64

In [23]:
# Calculate the temperature diffrence using proprety syntax
temp_dif_cities=temps_dif.karachi-temps_dif.Lahore
temp_dif_cities

2015-08-01   -1
2015-08-02    3
2015-08-03    3
2015-08-04   -4
2015-08-05    3
2015-08-06   -6
Freq: D, dtype: int64

In [24]:
# add a column to temp dif that contains temprature diffrence
temps_dif['Difference']=temp_dif_cities
temps_dif

            Lahore  karachi  Difference
2015-08-01      23       22          -1
2015-08-02      24       27           3
2015-08-03      21       24           3
2015-08-04      25       21          -4
2015-08-05      22       25           3
2015-08-06      26       20          -6

In [25]:
# get the columns which is also an index object
temps_dif.columns

Index(['Lahore', 'karachi', 'Difference'], dtype='object')

In [26]:
# slice the temperature diffrence rows from 1 to 4
temps_dif.Difference[1:4]

2015-08-02    3
2015-08-03    3
2015-08-04   -4
Freq: D, Name: Difference, dtype: int64

In [27]:
# get the rows at array position 1
temps_dif.iloc[1]

Lahore        24
karachi       27
Difference     3
Name: 2015-08-02 00:00:00, dtype: int64

This has converted the row into a Series , with the column names of the DataFrame
pivoted into the index labels of the resulting Series

In [28]:
temps_dif.iloc[1].index

Index(['Lahore', 'karachi', 'Difference'], dtype='object')

Rows can be explicitly accessed via index label using the .loc property

In [29]:
temps_dif.loc['2015-08-03']

Lahore        21
karachi       24
Difference     3
Name: 2015-08-03 00:00:00, dtype: int64

In [30]:
# get the values in the Differences column in rows 1, 3, and 5 using 0-based location
temps_dif.iloc[[1,3,5]].Difference

2015-08-02    3
2015-08-04   -4
2015-08-06   -6
Freq: 2D, Name: Difference, dtype: int64

In [31]:
temps_dif.Lahore>22

2015-08-01     True
2015-08-02     True
2015-08-03    False
2015-08-04     True
2015-08-05    False
2015-08-06     True
Freq: D, Name: Lahore, dtype: bool

In [32]:
temps_dif[temps_dif.Lahore>22]

            Lahore  karachi  Difference
2015-08-01      23       22          -1
2015-08-02      24       27           3
2015-08-04      25       21          -4
2015-08-06      26       20          -6

**Loading data from files and the Web**

Loading CSV data from files

In [33]:
!cat data/test1.csv

cat: data/test1.csv: No such file or directory


In [34]:
! cat home/sajid/1/H10_xray.csv

cat: home/sajid/1/H10_xray.csv: No such file or directory


In [35]:
data=pd.read_csv('https://raw.githubusercontent.com/sajid-munawar/Quarter_2/master/Pandas/Learn%20from%20portal/examples/csv_mindex.csv')

In [36]:
data

  key1 key2  value1  value2
0  one    a       1       2
1  one    b       3       4
2  one    c       5       6
3  one    d       7       8
4  two    a       9      10
5  two    b      11      12
6  two    c      13      14
7  two    d      15      16

In [37]:
data.index

RangeIndex(start=0, stop=8, step=1)

In [38]:
from google.colab import files
uploaded=files.upload()

Saving ex2.csv to ex2.csv


In [39]:
type(uploaded)

dict

In [41]:
# pd.read_csv(uploaded)

In [51]:
date=pd.date_range('05-01-2021','05-05-2021')
date2=pd.date_range('2021-10-01','2021-10-05')
date2

DatetimeIndex(['2021-10-01', '2021-10-02', '2021-10-03', '2021-10-04',
               '2021-10-05'],
              dtype='datetime64[ns]', freq='D')

In [52]:
date

DatetimeIndex(['2021-05-01', '2021-05-02', '2021-05-03', '2021-05-04',
               '2021-05-05'],
              dtype='datetime64[ns]', freq='D')

In [59]:
date2=pd.date_range('2021-10-01','2021-10-05')
nums=np.random.randint(1,100,5)
df3=DataFrame({'date':date2, 'Class one':nums})


In [60]:
nums

array([90, 14, 30,  1, 11])

In [61]:
df1=DataFrame(nums,index=date2)

In [62]:
df1

             0
2021-10-01  90
2021-10-02  14
2021-10-03  30
2021-10-04   1
2021-10-05  11

In [63]:
df2=DataFrame({'Class one' :nums},index=date2)

In [64]:
df2

            Class one
2021-10-01         90
2021-10-02         14
2021-10-03         30
2021-10-04          1
2021-10-05         11

In [71]:
df3=DataFrame({'date':date2, 'Class one':nums})
df3.to_csv()

',date,Class one\n0,2021-10-01,90\n1,2021-10-02,14\n2,2021-10-03,30\n3,2021-10-04,1\n4,2021-10-05,11\n'

**Loading Data from the web**

In [76]:
# from pandas.io.data import DataReader
# from pandas.io.data import DataReader
from datetime import date
from dateutil.relativedelta import relativedelta

goog = DataReader("GOOG",
"yahoo",
date.today() +
relativedelta(months=-3))