# Pandas

Data Analytics is a process of analyzing large set of data points to get answers on questions related to that dataset and Pandas is python library that makes data science extremely easy and effective.

## What is pandas
it is python's most famous framework for data analytics

## What is Pandas Series?
it is a class which provides one-dim labeled array capable of holding data of any type (integers,string,float,python object,etc.). The label axis is collectively called index. Series is nothing but the column of an excel sheet

<b>note:</b> labels need not to be unique but must be hashable.

### Hashable
all the immutable objects in python are hashable you can check an object is hashable or not using the <b> hash </b> function

In [1]:
hash("krishna")

-1239309652367938068

In [2]:
hash(tuple(range(10)))

-4181190870548101704

In [3]:
# hash(set(range(10))) # list set dictionary are unhashable objects and tuple frozen set string are hashable

## Ways to create the Pandas Series

In [4]:
#!pip install pandas # only for those who use jupyter individually

In [5]:
import pandas as pd

In [6]:
# empty series
ser = pd.Series()
print(ser)

Series([], dtype: float64)


In [7]:
# creating series from numpy arrays

In [8]:
import numpy as np

In [9]:
data = np.array(list("krishna"))

In [10]:
data

array(['k', 'r', 'i', 's', 'h', 'n', 'a'], dtype='<U1')

In [11]:
ser = pd.Series(data)

In [12]:
ser

0    k
1    r
2    i
3    s
4    h
5    n
6    a
dtype: object

In [13]:
#creating series from python dictionary
data = {x:y for x,y in zip("krishna","ABCDEFG")}

In [14]:
data

{'k': 'A', 'r': 'B', 'i': 'C', 's': 'D', 'h': 'E', 'n': 'F', 'a': 'G'}

In [15]:
ser = pd.Series(data)

In [16]:
ser

k    A
r    B
i    C
s    D
h    E
n    F
a    G
dtype: object

In [17]:
ser['k']

'A'

In [18]:
# creating series from scalar values
s = pd.Series(10,index=range(20,50))

In [19]:
s

20    10
21    10
22    10
23    10
24    10
25    10
26    10
27    10
28    10
29    10
30    10
31    10
32    10
33    10
34    10
35    10
36    10
37    10
38    10
39    10
40    10
41    10
42    10
43    10
44    10
45    10
46    10
47    10
48    10
49    10
dtype: int64

# Accessing the elements of series objects.

there are two ways to access the elements of series object.

    -Access the elements from series with the help of position
    -Access the elements from series with the help Label (index)

In [20]:
ser

k    A
r    B
i    C
s    D
h    E
n    F
a    G
dtype: object

In [21]:
# accessing position wise
ser[0]

'A'

In [22]:
ser[:5]

k    A
r    B
i    C
s    D
h    E
dtype: object

In [23]:
# accessing the elements using labels (index)
ser['r']

'B'

In [24]:
ser['r':'h']

r    B
i    C
s    D
h    E
dtype: object

In [25]:
s = pd.Series(50,index = list("KRISHNAKRISHNA"))

In [26]:
s

K    50
R    50
I    50
S    50
H    50
N    50
A    50
K    50
R    50
I    50
S    50
H    50
N    50
A    50
dtype: int64

In [27]:
s['K']

K    50
K    50
dtype: int64

#### some commonly used attributes of pandas series object

In [28]:
s.shape

(14,)

In [29]:
s.describe() # returns the statistical information about the series elements

count    14.0
mean     50.0
std       0.0
min      50.0
25%      50.0
50%      50.0
75%      50.0
max      50.0
dtype: float64

In [30]:
s.count()

14

In [31]:
s.mean()

50.0

### Head & Tail

In [32]:
s.head() # it will show the first five elements

K    50
R    50
I    50
S    50
H    50
dtype: int64

In [33]:
s.head(3) # it will show the first n elements

K    50
R    50
I    50
dtype: int64

In [34]:
s.tail()

I    50
S    50
H    50
N    50
A    50
dtype: int64

In [35]:
s.tail(3)

H    50
N    50
A    50
dtype: int64

In [36]:
s.sample(3)

H    50
K    50
R    50
dtype: int64

### Appendation operation
adding something to the tail of anything.

In [37]:
s

K    50
R    50
I    50
S    50
H    50
N    50
A    50
K    50
R    50
I    50
S    50
H    50
N    50
A    50
dtype: int64

In [38]:
ser

k    A
r    B
i    C
s    D
h    E
n    F
a    G
dtype: object

In [39]:
s.append(ser) # no actual changes will be made to your original series

K    50
R    50
I    50
S    50
H    50
N    50
A    50
K    50
R    50
I    50
S    50
H    50
N    50
A    50
k     A
r     B
i     C
s     D
h     E
n     F
a     G
dtype: object

In [40]:
# Deletetion operation with series object
ser

k    A
r    B
i    C
s    D
h    E
n    F
a    G
dtype: object

In [41]:
del ser['k']

In [42]:
ser

r    B
i    C
s    D
h    E
n    F
a    G
dtype: object

In [43]:
ser.drop(['i','n'])

r    B
s    D
h    E
a    G
dtype: object

In [44]:
ser.drop_duplicates()

r    B
i    C
s    D
h    E
n    F
a    G
dtype: object

In [45]:
ser

r    B
i    C
s    D
h    E
n    F
a    G
dtype: object

In [46]:
s.drop_duplicates()

K    50
dtype: int64

In [47]:
np.NaN # Not A number 

nan

In [48]:
s

K    50
R    50
I    50
S    50
H    50
N    50
A    50
K    50
R    50
I    50
S    50
H    50
N    50
A    50
dtype: int64

In [49]:
s['K'] = np.NaN

In [50]:
s

K     NaN
R    50.0
I    50.0
S    50.0
H    50.0
N    50.0
A    50.0
K     NaN
R    50.0
I    50.0
S    50.0
H    50.0
N    50.0
A    50.0
dtype: float64

In [51]:
s.dropna()

R    50.0
I    50.0
S    50.0
H    50.0
N    50.0
A    50.0
R    50.0
I    50.0
S    50.0
H    50.0
N    50.0
A    50.0
dtype: float64

In [52]:
s.dropna(inplace=True) # this will make changes into original series object

In [53]:
s

R    50.0
I    50.0
S    50.0
H    50.0
N    50.0
A    50.0
R    50.0
I    50.0
S    50.0
H    50.0
N    50.0
A    50.0
dtype: float64

In [54]:
s['S'] = np.NaN

In [55]:
s

R    50.0
I    50.0
S     NaN
H    50.0
N    50.0
A    50.0
R    50.0
I    50.0
S     NaN
H    50.0
N    50.0
A    50.0
dtype: float64

In [56]:
s.fillna(30)

R    50.0
I    50.0
S    30.0
H    50.0
N    50.0
A    50.0
R    50.0
I    50.0
S    30.0
H    50.0
N    50.0
A    50.0
dtype: float64

In [57]:
s = pd.Series(range(11,55,2))

In [58]:
s

0     11
1     13
2     15
3     17
4     19
5     21
6     23
7     25
8     27
9     29
10    31
11    33
12    35
13    37
14    39
15    41
16    43
17    45
18    47
19    49
20    51
21    53
dtype: int64

In [59]:
s[s.between(20,30)]

5    21
6    23
7    25
8    27
9    29
dtype: int64

In [60]:
s.filter(items=range(5,10))

5    21
6    23
7    25
8    27
9    29
dtype: int64

In [61]:
data = "THE QUICK BROWN FOX JUMPS OVER LITTLE LAZY DOG".split()

In [62]:
s_data = {x:i for i,x in enumerate(data)}

In [63]:
s_data

{'THE': 0,
 'QUICK': 1,
 'BROWN': 2,
 'FOX': 3,
 'JUMPS': 4,
 'OVER': 5,
 'LITTLE': 6,
 'LAZY': 7,
 'DOG': 8}

In [64]:
s = pd.Series(s_data)

In [65]:
s

THE       0
QUICK     1
BROWN     2
FOX       3
JUMPS     4
OVER      5
LITTLE    6
LAZY      7
DOG       8
dtype: int64

In [66]:
s.filter(like="O")

BROWN    2
FOX      3
OVER     5
DOG      8
dtype: int64

In [67]:
s.filter(regex="^[A-D]")

BROWN    2
DOG      8
dtype: int64

# Sorting the Pandas Series

In [68]:
s

THE       0
QUICK     1
BROWN     2
FOX       3
JUMPS     4
OVER      5
LITTLE    6
LAZY      7
DOG       8
dtype: int64

In [69]:
s.sort_index()

BROWN     2
DOG       8
FOX       3
JUMPS     4
LAZY      7
LITTLE    6
OVER      5
QUICK     1
THE       0
dtype: int64

In [70]:
s.sort_index(ascending=False)

THE       0
QUICK     1
OVER      5
LITTLE    6
LAZY      7
JUMPS     4
FOX       3
DOG       8
BROWN     2
dtype: int64

In [71]:
s.sort_values(ascending=False)

DOG       8
LAZY      7
LITTLE    6
OVER      5
JUMPS     4
FOX       3
BROWN     2
QUICK     1
THE       0
dtype: int64

In [72]:
s

THE       0
QUICK     1
BROWN     2
FOX       3
JUMPS     4
OVER      5
LITTLE    6
LAZY      7
DOG       8
dtype: int64

In [73]:
s['THE':'FOX']

THE      0
QUICK    1
BROWN    2
FOX      3
dtype: int64

In [74]:
s[0:5]

THE      0
QUICK    1
BROWN    2
FOX      3
JUMPS    4
dtype: int64

# Spcial Indexing | slicing
#### -loc[ ]
if you wish to access the elements of series with help of indexing or labels


In [75]:
s.loc['THE']

0

In [76]:
s.loc['THE':'JUMPS']

THE      0
QUICK    1
BROWN    2
FOX      3
JUMPS    4
dtype: int64

### iloc[ ]
it slices the series by using positions only

In [77]:
s.iloc[0]

0

In [78]:
s.iloc[0:5] # slicing rules are applicable here.

THE      0
QUICK    1
BROWN    2
FOX      3
JUMPS    4
dtype: int64

## BINARY OPERATIONS WITH SERIES OBJECT

In [79]:
x = pd.Series(range(10),dtype="int16")

In [80]:
y = pd.Series(range(7,20),dtype=np.int16)

In [81]:
x

0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
dtype: int16

In [82]:
y

0      7
1      8
2      9
3     10
4     11
5     12
6     13
7     14
8     15
9     16
10    17
11    18
12    19
dtype: int16

In [83]:
x + y

0      7.0
1      9.0
2     11.0
3     13.0
4     15.0
5     17.0
6     19.0
7     21.0
8     23.0
9     25.0
10     NaN
11     NaN
12     NaN
dtype: float64

In [84]:
x.add(y,fill_value=10).add(x,fill_value=45)

0      7.0
1     10.0
2     13.0
3     16.0
4     19.0
5     22.0
6     25.0
7     28.0
8     31.0
9     34.0
10    72.0
11    73.0
12    74.0
dtype: float64

In [85]:
x.sub(y)

0    -7.0
1    -7.0
2    -7.0
3    -7.0
4    -7.0
5    -7.0
6    -7.0
7    -7.0
8    -7.0
9    -7.0
10    NaN
11    NaN
12    NaN
dtype: float64

In [86]:
# s.mul(), s.div(), s.mod(), etc...


In [87]:
x

0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
dtype: int16

In [88]:
x[0] = np.NaN

In [89]:
x[1:5] = np.NaN

In [90]:
x.sum(min_count=3)

35.0

In [91]:
x

0    NaN
1    NaN
2    NaN
3    NaN
4    NaN
5    5.0
6    6.0
7    7.0
8    8.0
9    9.0
dtype: float64

In [92]:
y.mean()

13.0

In [93]:
y

0      7
1      8
2      9
3     10
4     11
5     12
6     13
7     14
8     15
9     16
10    17
11    18
12    19
dtype: int16

In [94]:
x = pd.Series(range(20,33))

In [95]:
x

0     20
1     21
2     22
3     23
4     24
5     25
6     26
7     27
8     28
9     29
10    30
11    31
12    32
dtype: int64

In [96]:
y.cov(x)

15.166666666666666

In [97]:
x.drop([0],inplace=True)

In [98]:
x.gt(y,fill_value = 0)

0     False
1      True
2      True
3      True
4      True
5      True
6      True
7      True
8      True
9      True
10     True
11     True
12     True
dtype: bool

In [99]:
#lt() ge() le() ne() eq()

In [100]:
x

1     21
2     22
3     23
4     24
5     25
6     26
7     27
8     28
9     29
10    30
11    31
12    32
dtype: int64

In [101]:
x.astype(str)

1     21
2     22
3     23
4     24
5     25
6     26
7     27
8     28
9     29
10    30
11    31
12    32
dtype: object

In [102]:
x.iloc[0:] = x.astype(str)

In [103]:
x

1     21
2     22
3     23
4     24
5     25
6     26
7     27
8     28
9     29
10    30
11    31
12    32
dtype: object

In [104]:
x.dtype

dtype('O')

In [105]:
x.tolist()

['21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32']

In [106]:
ser = pd.read_csv("https://bit.ly/uforeports")['City']

In [107]:
data

['THE', 'QUICK', 'BROWN', 'FOX', 'JUMPS', 'OVER', 'LITTLE', 'LAZY', 'DOG']

In [108]:
x = pd.Series(s_data)

In [109]:
x

THE       0
QUICK     1
BROWN     2
FOX       3
JUMPS     4
OVER      5
LITTLE    6
LAZY      7
DOG       8
dtype: int64

In [110]:
print(dir(x))

['BROWN', 'DOG', 'FOX', 'JUMPS', 'LAZY', 'LITTLE', 'OVER', 'QUICK', 'T', 'THE', '_AXIS_ALIASES', '_AXIS_IALIASES', '_AXIS_LEN', '_AXIS_NAMES', '_AXIS_NUMBERS', '_AXIS_ORDERS', '_AXIS_REVERSED', '_HANDLED_TYPES', '__abs__', '__add__', '__and__', '__array__', '__array_priority__', '__array_ufunc__', '__array_wrap__', '__bool__', '__class__', '__contains__', '__copy__', '__deepcopy__', '__delattr__', '__delitem__', '__dict__', '__dir__', '__div__', '__divmod__', '__doc__', '__eq__', '__finalize__', '__float__', '__floordiv__', '__format__', '__ge__', '__getattr__', '__getattribute__', '__getitem__', '__getstate__', '__gt__', '__hash__', '__iadd__', '__iand__', '__ifloordiv__', '__imod__', '__imul__', '__init__', '__init_subclass__', '__int__', '__invert__', '__ior__', '__ipow__', '__isub__', '__iter__', '__itruediv__', '__ixor__', '__le__', '__len__', '__long__', '__lt__', '__matmul__', '__mod__', '__module__', '__mul__', '__ne__', '__neg__', '__new__', '__nonzero__', '__or__', '__pos__',

In [111]:
x.apply(lambda x:x*x)

THE        0
QUICK      1
BROWN      4
FOX        9
JUMPS     16
OVER      25
LITTLE    36
LAZY      49
DOG       64
dtype: int64

In [112]:
state = pd.read_csv("https://bit.ly/uforeports")['State']

In [113]:
state

0        NY
1        NJ
2        CO
3        KS
4        NY
         ..
18236    IL
18237    IA
18238    WI
18239    WI
18240    FL
Name: State, Length: 18241, dtype: object

In [114]:
ser

0                      Ithaca
1                 Willingboro
2                     Holyoke
3                     Abilene
4        New York Worlds Fair
                 ...         
18236              Grant Park
18237             Spirit Lake
18238             Eagle River
18239             Eagle River
18240                    Ybor
Name: City, Length: 18241, dtype: object

In [115]:
temp = dict(ser.iloc[:10])

In [116]:
city = state.iloc[:5]

In [117]:
city

0    NY
1    NJ
2    CO
3    KS
4    NY
Name: State, dtype: object

In [118]:
temp

{0: 'Ithaca',
 1: 'Willingboro',
 2: 'Holyoke',
 3: 'Abilene',
 4: 'New York Worlds Fair',
 5: 'Valley City',
 6: 'Crater Lake',
 7: 'Alma',
 8: 'Eklutna',
 9: 'Hubbard'}

In [119]:
x.map(temp)

THE                     Ithaca
QUICK              Willingboro
BROWN                  Holyoke
FOX                    Abilene
JUMPS     New York Worlds Fair
OVER               Valley City
LITTLE             Crater Lake
LAZY                      Alma
DOG                    Eklutna
dtype: object

## Pandas DataFrame

It is a two dimenssional size mutable, hatergeneous tabular data structure with labeled axises (row and columns)

In [120]:
pd.__version__

'0.25.3'

In [121]:
data = [['krishna',45],['rama',50],['tenali',78]]

In [122]:
df = pd.DataFrame(data,columns=['Name',"Age"])

In [123]:
df.head()

Unnamed: 0,Name,Age
0,krishna,45
1,rama,50
2,tenali,78


In [124]:
print(df.head())

      Name  Age
0  krishna   45
1     rama   50
2   tenali   78


In [125]:
df = pd.DataFrame({
    'Name':list('ABCDEFGHI'),
    'AGE':range(1,10)
})

In [126]:
df

Unnamed: 0,Name,AGE
0,A,1
1,B,2
2,C,3
3,D,4
4,E,5
5,F,6
6,G,7
7,H,8
8,I,9


In [127]:
pd.DataFrame(
[
    dict(a=1,b=2,c=3),
    dict(a=10,b=20,c=30)
]
)

Unnamed: 0,a,b,c
0,1,2,3
1,10,20,30


In [128]:
name = list("ABCDEFGHI")
age = list(range(10,20))

In [129]:
pd.DataFrame(zip(name,age),columns=['NAME','AGE'])

Unnamed: 0,NAME,AGE
0,A,10
1,B,11
2,C,12
3,D,13
4,E,14
5,F,15
6,G,16
7,H,17
8,I,18


In [130]:
# in real life problems you will always load the dataset from diff-2 resources.

# reading the csv

df = pd.read_csv("data.csv")

In [131]:
df.head()

Unnamed: 0,Date,City_or_State,Temprature,Wind Speed,Event
0,01-01-2020,New Delhi,31.0,68.0,Cloudy
1,07-01-2020,Hyderabad,20.0,77.0,Awesome
2,13-01-2020,Utter Pradesh,18.0,76.0,Awesome
3,19-01-2020,Kerala,4.0,39.0,Awesome
4,25-01-2020,Punjab,30.0,59.0,Cloudy


In [132]:
df.Temprature.dtype

dtype('float64')

In [133]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21 entries, 0 to 20
Data columns (total 5 columns):
Date             20 non-null object
City_or_State    20 non-null object
Temprature       20 non-null float64
Wind Speed       20 non-null float64
Event            20 non-null object
dtypes: float64(2), object(3)
memory usage: 968.0+ bytes


In [134]:
df.shape

(21, 5)

In [135]:
# you can skip n rows while reading the csv file or excel files

In [136]:
df = pd.read_csv('data.csv',skiprows=5)
df.head()

Unnamed: 0,25-01-2020,Punjab,30,59,Cloudy
0,31-01-2020,Harayana,40.0,45.0,Cloudy
1,06-02-2020,New Delhi,17.0,57.0,Awesome
2,12-02-2020,Hyderabad,14.0,82.0,Awesome
3,18-02-2020,Utter Pradesh,20.0,88.0,Awesome
4,24-02-2020,Kerala,14.0,39.0,Awesome


In [137]:
df = pd.read_csv('data.csv',skiprows=5,header=None)
df.head()

Unnamed: 0,0,1,2,3,4
0,25-01-2020,Punjab,30.0,59.0,Cloudy
1,31-01-2020,Harayana,40.0,45.0,Cloudy
2,06-02-2020,New Delhi,17.0,57.0,Awesome
3,12-02-2020,Hyderabad,14.0,82.0,Awesome
4,18-02-2020,Utter Pradesh,20.0,88.0,Awesome


In [138]:
df = pd.read_csv('data.csv',skiprows=5,header=None,names=["Date","State","Temp","WindSpeed","Event"])
df.head()

Unnamed: 0,Date,State,Temp,WindSpeed,Event
0,25-01-2020,Punjab,30.0,59.0,Cloudy
1,31-01-2020,Harayana,40.0,45.0,Cloudy
2,06-02-2020,New Delhi,17.0,57.0,Awesome
3,12-02-2020,Hyderabad,14.0,82.0,Awesome
4,18-02-2020,Utter Pradesh,20.0,88.0,Awesome


In [139]:
df

Unnamed: 0,Date,State,Temp,WindSpeed,Event
0,25-01-2020,Punjab,30.0,59.0,Cloudy
1,31-01-2020,Harayana,40.0,45.0,Cloudy
2,06-02-2020,New Delhi,17.0,57.0,Awesome
3,12-02-2020,Hyderabad,14.0,82.0,Awesome
4,18-02-2020,Utter Pradesh,20.0,88.0,Awesome
5,24-02-2020,Kerala,14.0,39.0,Awesome
6,01-03-2020,Punjab,8.0,33.0,Awesome
7,07-03-2020,Harayana,32.0,82.0,Cloudy
8,13-03-2020,New Delhi,40.0,90.0,Cloudy
9,19-03-2020,Hyderabad,5.0,87.0,Awesome


In [140]:
pd.read_csv('data.csv',nrows=2)

Unnamed: 0,Date,City_or_State,Temprature,Wind Speed,Event
0,01-01-2020,New Delhi,31,68,Cloudy
1,07-01-2020,Hyderabad,20,77,Awesome


In [141]:
pd.read_csv('data.csv',skiprows=10,nrows=5)

Unnamed: 0,24-02-2020,Kerala,14,39,Awesome
0,01-03-2020,Punjab,8,33,Awesome
1,07-03-2020,Harayana,32,82,Cloudy
2,13-03-2020,New Delhi,40,90,Cloudy
3,19-03-2020,Hyderabad,5,87,Awesome
4,25-03-2020,Utter Pradesh,31,45,Cloudy


In [142]:
df = pd.read_csv("data.csv")

In [143]:
df.tail()

Unnamed: 0,Date,City_or_State,Temprature,Wind Speed,Event
16,06-04-2020,Punjab,23.0,78.0,Cloudy
17,12-04-2020,Harayana,22.0,54.0,Cloudy
18,18-04-2020,New Delhi,27.0,95.0,Cloudy
19,24-04-2020,Hyderabad,31.0,54.0,Cloudy
20,,,,,


In [144]:
df = pd.read_csv("data.csv",na_values="Cloudy")

In [145]:
df.tail()

Unnamed: 0,Date,City_or_State,Temprature,Wind Speed,Event
16,06-04-2020,Punjab,23.0,78.0,
17,12-04-2020,Harayana,22.0,54.0,
18,18-04-2020,New Delhi,27.0,95.0,
19,24-04-2020,Hyderabad,31.0,54.0,
20,,,,,


In [146]:
df = pd.read_csv("data.csv",na_values={
    'Date':'06-04-2020',
    'City_or_State':['Haryana',"New Delhi"],
    'Temprature':22.0,
    'Wind Speed':54.0,
    'Event':'Sunny'
})

In [147]:
df.head()

Unnamed: 0,Date,City_or_State,Temprature,Wind Speed,Event
0,01-01-2020,,31.0,68.0,Cloudy
1,07-01-2020,Hyderabad,20.0,77.0,Awesome
2,13-01-2020,Utter Pradesh,18.0,76.0,Awesome
3,19-01-2020,Kerala,4.0,39.0,Awesome
4,25-01-2020,Punjab,30.0,59.0,Cloudy


In [148]:
df

Unnamed: 0,Date,City_or_State,Temprature,Wind Speed,Event
0,01-01-2020,,31.0,68.0,Cloudy
1,07-01-2020,Hyderabad,20.0,77.0,Awesome
2,13-01-2020,Utter Pradesh,18.0,76.0,Awesome
3,19-01-2020,Kerala,4.0,39.0,Awesome
4,25-01-2020,Punjab,30.0,59.0,Cloudy
5,31-01-2020,Harayana,40.0,45.0,Cloudy
6,06-02-2020,,17.0,57.0,Awesome
7,12-02-2020,Hyderabad,14.0,82.0,Awesome
8,18-02-2020,Utter Pradesh,20.0,88.0,Awesome
9,24-02-2020,Kerala,14.0,39.0,Awesome


In [149]:
df.fillna("12",inplace=True)

In [150]:
df

Unnamed: 0,Date,City_or_State,Temprature,Wind Speed,Event
0,01-01-2020,12,31,68,Cloudy
1,07-01-2020,Hyderabad,20,77,Awesome
2,13-01-2020,Utter Pradesh,18,76,Awesome
3,19-01-2020,Kerala,4,39,Awesome
4,25-01-2020,Punjab,30,59,Cloudy
5,31-01-2020,Harayana,40,45,Cloudy
6,06-02-2020,12,17,57,Awesome
7,12-02-2020,Hyderabad,14,82,Awesome
8,18-02-2020,Utter Pradesh,20,88,Awesome
9,24-02-2020,Kerala,14,39,Awesome


In [151]:
df.to_csv("mynewdata.csv",index=False)

In [152]:
pd.read_csv("mynewdata.csv").head()

Unnamed: 0,Date,City_or_State,Temprature,Wind Speed,Event
0,01-01-2020,12,31.0,68.0,Cloudy
1,07-01-2020,Hyderabad,20.0,77.0,Awesome
2,13-01-2020,Utter Pradesh,18.0,76.0,Awesome
3,19-01-2020,Kerala,4.0,39.0,Awesome
4,25-01-2020,Punjab,30.0,59.0,Cloudy


In [153]:
df

Unnamed: 0,Date,City_or_State,Temprature,Wind Speed,Event
0,01-01-2020,12,31,68,Cloudy
1,07-01-2020,Hyderabad,20,77,Awesome
2,13-01-2020,Utter Pradesh,18,76,Awesome
3,19-01-2020,Kerala,4,39,Awesome
4,25-01-2020,Punjab,30,59,Cloudy
5,31-01-2020,Harayana,40,45,Cloudy
6,06-02-2020,12,17,57,Awesome
7,12-02-2020,Hyderabad,14,82,Awesome
8,18-02-2020,Utter Pradesh,20,88,Awesome
9,24-02-2020,Kerala,14,39,Awesome


In [154]:
df.City_or_State.unique()

array(['12', 'Hyderabad', 'Utter Pradesh', 'Kerala', 'Punjab', 'Harayana'],
      dtype=object)

In [155]:
df = pd.read_clipboard(sep=",") # this function allows you to read the copied data into a data frame

In [156]:
df.head()

Unnamed: 0,https://us04web.zoom.us/j/5263681981?pwd=WHRKR0x2RitkNUhpNjZ3enhqd2hqdz09


In [157]:
df = pd.read_csv("http://bit.ly/uforeports")
df.head()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00
3,Abilene,,DISK,KS,6/1/1931 13:00
4,New York Worlds Fair,,LIGHT,NY,4/18/1933 19:00


In [158]:
df = pd.read_csv("https://bit.ly/drinksbycountry")

In [159]:
df.to_excel("newexceldata.xlsx","drinksdata",index=False)

In [160]:
#reading df from excel file
df = pd.read_excel("newexceldata.xlsx",na_values=0)

In [161]:
df.head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0,0,,Asia
1,Albania,89,132,54,4.9,Europe
2,Algeria,25,0,14,0.7,Africa
3,Andorra,245,138,312,12.4,Europe
4,Angola,217,57,45,5.9,Africa


In [162]:
type(df.beer_servings[0])

numpy.int64

In [163]:
df = pd.read_excel("newexceldata.xlsx",converters={'beer_servings':lambda x:np.NaN if x == 0 else x})

In [164]:
df.head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,,0,0,0.0,Asia
1,Albania,89.0,132,54,4.9,Europe
2,Algeria,25.0,0,14,0.7,Africa
3,Andorra,245.0,138,312,12.4,Europe
4,Angola,217.0,57,45,5.9,Africa


In [165]:
df = pd.read_excel("newexceldata.xlsx",na_values={
    'beer_servings':0.0
})

In [166]:
df.head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0,0,0.0,Asia
1,Albania,89,132,54,4.9,Europe
2,Algeria,25,0,14,0.7,Africa
3,Andorra,245,138,312,12.4,Europe
4,Angola,217,57,45,5.9,Africa


In [167]:
pd.read_csv("https://bit.ly/uforeports")

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00
3,Abilene,,DISK,KS,6/1/1931 13:00
4,New York Worlds Fair,,LIGHT,NY,4/18/1933 19:00
...,...,...,...,...,...
18236,Grant Park,,TRIANGLE,IL,12/31/2000 23:00
18237,Spirit Lake,,DISK,IA,12/31/2000 23:00
18238,Eagle River,,,WI,12/31/2000 23:45
18239,Eagle River,RED,LIGHT,WI,12/31/2000 23:45


In [168]:
first3cap = lambda x:x[:3].upper()
df = pd.read_csv('https://bit.ly/drinksbycountry',converters={'country':first3cap})

In [169]:
df.head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,AFG,0,0,0,0.0,Asia
1,ALB,89,132,54,4.9,Europe
2,ALG,25,0,14,0.7,Africa
3,AND,245,138,312,12.4,Europe
4,ANG,217,57,45,5.9,Africa


In [170]:
df1 = pd.read_csv('https://bit.ly/smallstocks')

In [171]:
df1

Unnamed: 0,Date,Close,Volume,Symbol
0,2016-10-03,31.5,14070500,CSCO
1,2016-10-03,112.52,21701800,AAPL
2,2016-10-03,57.42,19189500,MSFT
3,2016-10-04,113.0,29736800,AAPL
4,2016-10-04,57.24,20085900,MSFT
5,2016-10-04,31.35,18460400,CSCO
6,2016-10-05,57.64,16726400,MSFT
7,2016-10-05,31.59,11808600,CSCO
8,2016-10-05,113.05,21453100,AAPL


In [172]:
df1.to_excel("stocks.xlsx","stock details")

In [173]:
df1.to_excel("stocks.xlsx","stock details",index=False)

In [174]:
df1.to_excel("stocks.xlsx","stock details",index=False,startrow=9,startcol=6)

In [175]:
wb = pd.ExcelWriter("workbook.xlsx")
df1.to_excel(wb,"stocks")
df.to_excel(wb,"drinks")

In [176]:
wb.close()

https://bit.ly/smallstocks

https://bit.ly/drinksbycountry

https://bit.ly/uforeports

https://bit.ly/chiporders

1-read all the above mentioned dataset and write them into an excel workbook using for loop (wrting to_excel statement for individual data frame is not allowed)

2-read the chip orders data frame and create a new workbook in which each sheet will be having the details for particular order no.

3-load the uforeports dataset into a dataframe and while loading deal with all the missing values and store the specific value for each column.

In [180]:
t = 'https://bit.ly/smallstocks'
t.split('/')

['https:', '', 'bit.ly', 'smallstocks']

In [183]:
#1-read all the above mentioned dataset and write them into an excel workbook using for loop (wrting to_excel statement for individual data frame is not allowed)

datasets_link = [
    'https://bit.ly/smallstocks',
    'https://bit.ly/drinksbycountry',
    'https://bit.ly/uforeports',
    'https://bit.ly/chiporders'
]

wb = pd.ExcelWriter("datasets.xlsx")
for link in datasets_link:
    try:
        df = pd.read_csv(link)
    except:
        df = pd.read_csv(link,sep="\t")
    df.to_excel(wb,link.split('/')[-1])
wb.close()
    
    

In [184]:
df = pd.read_csv('https://bit.ly/chiporders',sep="\t")

In [188]:
unique_order_id = list(df.order_id.unique())

In [192]:
wb = pd.ExcelWriter("orders.xlsx")
for order in unique_order_id:
    df[df.order_id == order].to_excel(wb,f"order_no_{order}")
wb.close()
    

### LOADING THE DATAFRAME FROM A DATABASE

In [193]:
!pip install SQLAlchemy



In [194]:
!pip install pymysql mysql-connector



In [195]:
from sqlalchemy import create_engine

In [196]:
engine = create_engine('mysql+pymysql://kmeSJcz0Au:kvLG2YdVXv@remotemysql.com/kmeSJcz0Au')

In [198]:
pd.read_sql_table('drinks_data',engine).query("continent == 'Asia'")

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0,0,0.0,Asia
12,Bahrain,42,63,7,2.0,Asia
13,Bangladesh,0,0,0,0.0,Asia
19,Bhutan,23,0,0,0.4,Asia
24,Brunei,31,2,1,0.6,Asia
...,...,...,...,...,...,...
563,Turkmenistan,19,71,32,2.2,Asia
567,United Arab Emirates,16,135,5,2.8,Asia
572,Uzbekistan,25,101,8,2.4,Asia
575,Vietnam,111,2,1,2.0,Asia


In [201]:
asia = pd.read_sql_query("SELECT * FROM drinks_data WHERE continent = 'Asia'",engine)

In [202]:
asia.to_sql('Asia',engine,index=False)

In [203]:
import sqlite3 as sq

In [204]:
conn = sq.connect("mydatabase.db")


In [206]:
pd.read_sql("SELECT * FROM persons",conn)

Unnamed: 0,first_name,last_name,age
0,A,B,22
1,E,A,20
2,E,E,24
3,I,I,30
4,H,A,30
5,A,G,25
6,H,F,22
7,E,F,30
8,A,H,28
9,E,A,22


# create a new dataframe showing the frequency of ufo report for each year

# what was the color of most frequently seen ufo

# in which cycle of day most of the ufo's were reported.