# Python Data Science Handbook

*Jake VanderPlas*

![Book Cover](https://github.com/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/figures/PDSH-cover.png?raw=1)

This is the Jupyter notebook version of the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*
The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!

## Table of Contents

### [Preface](00.00-Preface.ipynb)

### [1. IPython: Beyond Normal Python](01.00-IPython-Beyond-Normal-Python.ipynb)
- [Help and Documentation in IPython](01.01-Help-And-Documentation.ipynb)
- [Keyboard Shortcuts in the IPython Shell](01.02-Shell-Keyboard-Shortcuts.ipynb)
- [IPython Magic Commands](01.03-Magic-Commands.ipynb)
- [Input and Output History](01.04-Input-Output-History.ipynb)
- [IPython and Shell Commands](01.05-IPython-And-Shell-Commands.ipynb)
- [Errors and Debugging](01.06-Errors-and-Debugging.ipynb)
- [Profiling and Timing Code](01.07-Timing-and-Profiling.ipynb)
- [More IPython Resources](01.08-More-IPython-Resources.ipynb)

### [2. Introduction to NumPy](02.00-Introduction-to-NumPy.ipynb)
- [Understanding Data Types in Python](02.01-Understanding-Data-Types.ipynb)
- [The Basics of NumPy Arrays](02.02-The-Basics-Of-NumPy-Arrays.ipynb)
- [Computation on NumPy Arrays: Universal Functions](02.03-Computation-on-arrays-ufuncs.ipynb)
- [Aggregations: Min, Max, and Everything In Between](02.04-Computation-on-arrays-aggregates.ipynb)
- [Computation on Arrays: Broadcasting](02.05-Computation-on-arrays-broadcasting.ipynb)
- [Comparisons, Masks, and Boolean Logic](02.06-Boolean-Arrays-and-Masks.ipynb)
- [Fancy Indexing](02.07-Fancy-Indexing.ipynb)
- [Sorting Arrays](02.08-Sorting.ipynb)
- [Structured Data: NumPy's Structured Arrays](02.09-Structured-Data-NumPy.ipynb)

### [3. Data Manipulation with Pandas](03.00-Introduction-to-Pandas.ipynb)
- [Introducing Pandas Objects](03.01-Introducing-Pandas-Objects.ipynb)
- [Data Indexing and Selection](03.02-Data-Indexing-and-Selection.ipynb)
- [Operating on Data in Pandas](03.03-Operations-in-Pandas.ipynb)
- [Handling Missing Data](03.04-Missing-Values.ipynb)
- [Hierarchical Indexing](03.05-Hierarchical-Indexing.ipynb)
- [Combining Datasets: Concat and Append](03.06-Concat-And-Append.ipynb)
- [Combining Datasets: Merge and Join](03.07-Merge-and-Join.ipynb)
- [Aggregation and Grouping](03.08-Aggregation-and-Grouping.ipynb)
- [Pivot Tables](03.09-Pivot-Tables.ipynb)
- [Vectorized String Operations](03.10-Working-With-Strings.ipynb)
- [Working with Time Series](03.11-Working-with-Time-Series.ipynb)
- [High-Performance Pandas: eval() and query()](03.12-Performance-Eval-and-Query.ipynb)
- [Further Resources](03.13-Further-Resources.ipynb)

### [4. Visualization with Matplotlib](04.00-Introduction-To-Matplotlib.ipynb)
- [Simple Line Plots](04.01-Simple-Line-Plots.ipynb)
- [Simple Scatter Plots](04.02-Simple-Scatter-Plots.ipynb)
- [Visualizing Errors](04.03-Errorbars.ipynb)
- [Density and Contour Plots](04.04-Density-and-Contour-Plots.ipynb)
- [Histograms, Binnings, and Density](04.05-Histograms-and-Binnings.ipynb)
- [Customizing Plot Legends](04.06-Customizing-Legends.ipynb)
- [Customizing Colorbars](04.07-Customizing-Colorbars.ipynb)
- [Multiple Subplots](04.08-Multiple-Subplots.ipynb)
- [Text and Annotation](04.09-Text-and-Annotation.ipynb)
- [Customizing Ticks](04.10-Customizing-Ticks.ipynb)
- [Customizing Matplotlib: Configurations and Stylesheets](04.11-Settings-and-Stylesheets.ipynb)
- [Three-Dimensional Plotting in Matplotlib](04.12-Three-Dimensional-Plotting.ipynb)
- [Geographic Data with Basemap](04.13-Geographic-Data-With-Basemap.ipynb)
- [Visualization with Seaborn](04.14-Visualization-With-Seaborn.ipynb)
- [Further Resources](04.15-Further-Resources.ipynb)

### [5. Machine Learning](05.00-Machine-Learning.ipynb)
- [What Is Machine Learning?](05.01-What-Is-Machine-Learning.ipynb)
- [Introducing Scikit-Learn](05.02-Introducing-Scikit-Learn.ipynb)
- [Hyperparameters and Model Validation](05.03-Hyperparameters-and-Model-Validation.ipynb)
- [Feature Engineering](05.04-Feature-Engineering.ipynb)
- [In Depth: Naive Bayes Classification](05.05-Naive-Bayes.ipynb)
- [In Depth: Linear Regression](05.06-Linear-Regression.ipynb)
- [In-Depth: Support Vector Machines](05.07-Support-Vector-Machines.ipynb)
- [In-Depth: Decision Trees and Random Forests](05.08-Random-Forests.ipynb)
- [In Depth: Principal Component Analysis](05.09-Principal-Component-Analysis.ipynb)
- [In-Depth: Manifold Learning](05.10-Manifold-Learning.ipynb)
- [In Depth: k-Means Clustering](05.11-K-Means.ipynb)
- [In Depth: Gaussian Mixture Models](05.12-Gaussian-Mixtures.ipynb)
- [In-Depth: Kernel Density Estimation](05.13-Kernel-Density-Estimation.ipynb)
- [Application: A Face Detection Pipeline](05.14-Image-Features.ipynb)
- [Further Machine Learning Resources](05.15-Learning-More.ipynb)

### [Appendix: Figure Code](06.00-Figure-Code.ipynb)

In [None]:
import math
a=int(input("Enter side "))
b=int(input("Enter side "))
c=int(input("Enter side "))
p=(a+b+c)/2
area=math.sqrt(p*(p-a)*(p-b)*(p-c))
print(area)


Enter side 5
Enter side 6
Enter side 7
14.696938456699069


1)	Area of Triangle :
Given the lengths of three sides of a triangle, calculate the area of the triangle. 


In [None]:
pal=input("Enter a string ")
if(pal==pal[::-1]):
  print(pal, "is palindrome")
else:
  print(pal, "is not palindrome")


Enter a string mom
mom is palindrome


2)Take a string from end user and check if the value is palindrome or not

In [None]:
year=int(input("Enter a year: "))
if(year%400==0):
    print(year,"is a leap year")
elif(year%100==0):
    print(year,"is not leap year")
elif(year%4==0):
    print(year,"is a leap year")
else:
    print(year,"is not leap year")


Enter a year: 2000
2000 is a leap year


3)Write a program that reads a year from the user and displays a message Indicating whether or not it is a leap year.

In [None]:
s=input("Enter a string")
print(s.replace(" ","-"))


Enter a stringhi i am 5F4
hi-i-am-5F4


4)Space To Hyphen problem 

Take a string as input, and replaces spaces “ “  with hyphens “-”, and returns a string.
Input: “ This program converts spaces into hyphen”
Output:     “ This-program-converts-spaces-into-hyphen”



In [None]:
s=input("Enter the words")
s=s.split(",")
a=set(s)
s=list(a)
s.sort()
c=[]
for i in s:
  c.append(i)
print(c)

Enter the wordsorange,white,red,cyan,green,magenta,cyan,pink,white
['cyan', 'green', 'magenta', 'orange', 'pink', 'red', 'white']


5)Unique Sort problem 

Take a string as input that accepts a comma separated sequence of words as input and prints the unique words in sorted form (alphanumerically).
*Input*: orange, white, red, cyan, green, magenta, cyan, pink, white
*Output*: cyan, green, magenta, orange, pink, red, white


In [None]:
income=int(input("Enter your total income "))
if(income<=250000):
  print("You have to pay no tax")
elif(income>250000 and income<=500000):
  tax=5*(income/100)
  print("You need to pay ",tax)
elif(income>500000 and income<=750000):
  tax=10*(income/100)
  print("You need to pay ",tax)
elif(income>750000 and income<=1000000):
  tax=15*(income/100)
  print("You need to pay ",tax)
elif(income>1000000 and income<=1250000):
  tax=20*(income/100)
  print("You need to pay ",tax)
elif(income>1250000 and income<=1500000):
  tax=25*(income/100)
  print("You need to pay ",tax)
else:
  tax=30*(income/100)
  print("You need to pay ",tax)

Enter your total income 300000
You need to pay  15000.0


6)Tax Calculator

Ask the user for their monthly salary. Calculate whether they have to pay tax and if so, how much is that amount .Print the result


In [None]:
l=input('Enter numbers separated by comma ').split(",")
s=""
s=s.join(l)
num=int(s)
print(num)


Enter numbers separated by comma 11,33,50
113350


7)Take a list of integers as an argument, and converts it into a single integer (return the integer).
*Input*: [11, 33, 50]
*Output*: 113350



In [None]:
a=int(input("Enter no of days "))
b=int(input("Enter no of hours "))
c=int(input("Enter no of minutes "))
d=int(input("Enter no of seconds "))
totsec=(a*86400)+(b*3600)+(c*60)+d
print(totsec,"seconds")

Enter no of days 2
Enter no of hours 5
Enter no of minutes 30
Enter no of seconds 34
192634 seconds


8)	Units of Time  
Create a program that reads duration from the user as a number of days, hours, minutes, and seconds. Compute and display the total number of seconds represented by this duration. 


In [None]:
x=input('enter 3 integers: ').split()
c=x.copy()
z=[]
for i in range(0,3):
    y=max(x)
    z.append(y)
    m=x.index(y)
    del x[m]
print('Sorted using max=',z)
z=[]
for i in range(0,3):
    y=min(c)
    z.append(y)
    m=c.index(y)
    del c[m]
print('Sorted using min=',z)

enter 3 integers: 10 20 30
Sorted using max= ['30', '20', '10']
Sorted using min= ['10', '20', '30']


9)	Sort 3 Integers    
Given three integers (given through user input), sort the numbers using |min| and  |max| functions.


In [None]:

x=input('enter date in given format').split("-")
if (x[1]=='02' and (x[2]=='28' or x[2]=='29')):
    if (x[2]=='28'):
        if ((int(x[0])%4==0 and int(x[0])%100!=0) or (int(x[0])%400==0)):
            x[2]='29'
        else:
            x[2]='01'
            x[1]='03'
    else:
        x[2]='01'
        x[1]='03'
elif ((x[1]=='04' or x[1]=='06' or x[1]=='09' or x[1]=='11') and (x[2]=='30')) :
    x[2]='01'
    d=int(x[1])
    d=d+1
    if (d<10):
        x[1]="0"+str(d)
    else:
        x[1]=str(d)
elif ((x[1]=='01' or x[1]=='03' or x[1]=='05' or x[1]=='07' or x[1]=='08' or x[1]=='10' or x[1]=='12') and (x[2]=='31')):
    x[2]='01'
    d=int(x[1])
    d=d+1
    if (d<10 and d!=13):
        x[1]="0"+str(d)
    elif (d<13):
        x[1]=str(d)
    elif (d==13):
        f=int(x[0])
        f+=1
        x[0]=str(f)
        x[1]='01'
else:
    d=int(x[2])+1
    x[2]=str(d)
y="-"
y=y.join(x)
y


enter date in given format2019-12-01


'2019-12-2'

10)	Write a program that reads a date from the user and computes its immediate successor. The date is the format YYYY-MM-DD. So, 2020-04-15 will have the successor 2020-04-16.

In [None]:
a=input("Enter numbers separated wth comma")
a=a.split(",")
x=1
for i in a:
  x=x*int(i)
print(x)

Enter numbers separated wth comma45 ,3,2,89,72,1,10,7
121111200


11)	Compute product of a list of numbers [45 ,3,2,89,72,1,10,7]
Output: 121111200


In [None]:
x=input("Enter the data ")
x=x.split(",")
z=[]
for i in range(0,len(x)-1):
  z.append(int(x[i])+int(x[i+1]))
print(z)

Enter the data 5,6,8,34,89,1
[11, 14, 42, 123, 90]


12)	Compute given Num_list =  [5, 6,8 ,34,89,1] to get desired output
Output: Out_list=[11,14,42,123,90]


In [None]:
a=(5,6,8,3,9,1)
l=[]
l.append(a[0])
for i in range(1,len(a)):
  l.append(a[i]*l[i-1])
print(l)

[5, 30, 240, 720, 6480, 6480]


13)Compute given Num_tuple =  (5, 6,8 ,3,9,1) to get desired output
Output: Out_list = [5, 30, 240, 720, 6480, 6480]


In [None]:
num=input("Enter a number")
lis=[]
for i in num:
    lis.append(int(i))
print(lis)

Enter a number586392
[5, 8, 6, 3, 9, 2]


14) Write a Python code that takes a number and returns a list of its digits. So for 586392 it should return [5,8,6,3,9,2]

In [None]:
s=input("Enter string ")
sum=0
m=""
if s[::1]==s[::-1]:
    print(s)
else:
    for i in range(len(s)):
        for j in range(i+1,len(s)):
            if s[i:j+1:1]==s[j:i-1:-1]:
                if sum<len(s[i:j+1:1]):
                    m=s[i:j+1:1]
                    sum=len(s[i:j+1:1])
    if(sum==0):
        print(s[0])
    else:
        print(m)

Enter string bananas
anana


15)	Write a program that finds the longest palindromic substring of a given string

In [None]:
n=int(input("Enter number of values "))
for i in range(0,n):
    x=input("Enter first string ")
    y=input("Enter second string ")
    b=y in x
    print(int(b))

Enter number of values 2
Enter first string 1010110010
Enter second string 10110
1
Enter first string 1110111011
Enter second string 10011
0


16)	 Substring Check (Bug Funny)
Given two binary strings, A (of length 10) and B (of length 5), output 1 if B is a substring of A and 0 otherwise.
First two lines of input:
1010110010          10110
1110111011           10011
First two lines of output:
1
0


In [None]:
def gcd(a,b):
    if b==0: 
        return a 
    return gcd(b,a%b) 
def countsteps(a,b,c): 
    x1=b
    x2=0
    count=1
    while((x1 is not c)and(x2 is not c)): 
        temp=min(x1,a-x2) 
        x2=x2+temp 
        x1=x1-temp 
        count=count+1
        if((x2==c)or(x1==c)): 
            break
        if x1==0: 
            x1=b 
            count=count+1
        if x2==a: 
            x2=0
            count=count+1
    return count 
def ispossible(a,b,c): 
    if a>b: 
        temp=a 
        a=b 
        b=temp 
    if c>b: 
        return -1
    if (c%(gcd(b,a)) is not 0): 
        return -1
    return(min(countsteps(b,a,c),countsteps(a,b,c)))
t=int(input("Enter number of testcases "))
for i in range(t):
  a=int(input("Enter capacity of vessel1 "))
  b=int(input("Enter capacity of vessel2 "))
  c=int(input("Enter capacity to be obtained "))
  print("Min no of steps required ",ispossible(a,b,c))

Enter number of testcases 2
Enter capacity of vessel1 2
Enter capacity of vessel2 3
Enter capacity to be obtained 1
Min no of steps required  2
Enter capacity of vessel1 1
Enter capacity of vessel2 1
Enter capacity to be obtained 1
Min no of steps required  1


17)	  POUR1 - Pouring water
Given two vessels, one of which can accommodate a litres of water and the other - b litres of water, determine the number of steps required to obtain exactly c litres of water in one of the vessels.
At the beginning both vessels are empty. The following operations are counted as 'steps':
•	emptying a vessel,
•	filling a vessel,
•	pouring water from one vessel to the other, without spilling, until one of the vessels is either full or empty.
Input
An integer t, 1<=t<=100, denoting the number of testcases , followed by t sets of input data, each consisting of three positive integers a, b, c, not larger than 40000, given in separate lines.
Output
For each set of input data, output the minimum number of steps required to obtain c litres, or -1 if this is impossible.
