# SHORTCUTS FOR JUPYTER



<li> %quickref Display the IPython Quick Reference Card
<li> %magic Display detailed documentation for all of the available magic commands
<li> %debug Enter the interactive debugger at the bottom of the last exception traceback
<li> %hist Print command input (and optionally output) history
<li> %pdb Automatically enter debugger after any exception
<li> %paste Execute pre-formatted Python code from clipboard
<li> %cpaste Open a special prompt for manually pasting Python code to be executed
<li> %reset Delete all variables / names defined in interactive namespace
<li> %page OBJECT Pretty print the object and display it through a pager
<li> %run script.py Run a Python script inside IPython
<li> %prun statement Execute statement with cProfile and report the profiler output
<li> %time statement Report the execution time of single statement
<li> %timeit statement Run a statement multiple times to compute an emsemble average execution time. Useful for
<li> timing code with very short execution time
<li> %who, %who_ls, %whos Display variables defined in interactive namespace, with varying levels of information /     verbosity
<li> %xdel variable Delete a variable and attempt to clear any references to the object in the IPython internals





<li> Ctrl-N or down-arrow Search forward in command history for commands starting with currently-entered text
<li>Ctrl-R Readline-style reverse history search (partial matching)
<li>Ctrl-Shift-V Paste text from clipboard
<li>Ctrl-C Interrupt currently-executing code
<li>Ctrl-A Move cursor to beginning of line
<li>Ctrl-E Move cursor to end of line
<li>Ctrl-K Delete text from cursor until end of line
<li>Ctrl-U Discard all text on current line
<li>Ctrl-F Move cursor forward one character
<li>Ctrl-B Move cursor back one character
<li>Ctrl-L Clear screen



## Interacting with the Operating System

<li>!cmd - Execute cmd in the system shell
<li>output = !cmd args- Run cmd and store the stdout in output
<li>%alias alias_name cmd - Define an alias for a system (shell) command
<li>%bookmark -Utilize IPython’s directory bookmarking system
<li>%cd- directory Change system working directory to passed directory
<li>%pwd- Return the current system working directory
<li>%pushd directory- Place current directory on stack and change to target directory
<li>%popd- Change to directory popped off the top of the stack
<li>%dirs- Return a list containing the current directory stack
<li>%dhist- Print the history of visited directories
<li>%env -Return the system environment variables as a dict


In [3]:
import pandas as pd
import numpy as np
%matplotlib inline

#  A Series is a one-dimensional array-like object containing an array of data (of any NumPy data type) and an associated array of data labels, called its index

In [34]:
a=pd.Series(['a','b',2,4,[1,2,3]],index=['one','two','three','four','five'])
a

one              a
two              b
three            2
four             4
five     [1, 2, 3]
dtype: object

In [25]:
a.values #return a array of the values

Ohio      35000
Oregon    16000
Texas     71000
Utah       5000
dtype: int64

In [26]:
a.index #return indexes

Index(['one', 'two', 'three', 'four', 'five'], dtype='object')

In [46]:
saldata = {'Royal': 35000, 'Abhinav': 71000, 'Mudassir': 16000, 'Sourav': 5000} #normal dict having salary data 

In [48]:
saldata #printing the value

{'Abhinav': 71000, 'Mudassir': 16000, 'Royal': 35000, 'Sourav': 5000}

In [42]:
seriesobj=pd.Series(sdata) #creating series obj from dict
seriesobj #printing the value

Abhinav     71000
Mudassir    16000
Royal       35000
Sourav       5000
dtype: int64

In [43]:
seriesobj['Royal'] #returning value corresponding to that data just like in dict

35000

In [45]:
names = ['Akash', 'Royal', 'Abhinav', 'Mudassir']
new_data = pd.Series(sdata, index=states)
new_data

Akash           NaN
Royal       35000.0
Abhinav     71000.0
Mudassir    16000.0
dtype: float64

In this case, 3 values found in sdata were placed in the appropriate locations, but since
no value for 'Akash' was found it was marked as 'NAN'( Not a number)  
To check wether the given data has any NAN or simply null values we have two functions 
#  isnull and notnull 


In [50]:
new_data.isnull() #Returns a boolean value if theres a ny NUll value ( True-Yes there is a NaN, False-No Nan)

Akash        True
Royal       False
Abhinav     False
Mudassir    False
dtype: bool

In [52]:
new_data.notnull()#Returns a boolean value if theres a ny NUll value ( False-Yes there is a NaN, True-No Nan)

Akash       False
Royal        True
Abhinav      True
Mudassir     True
dtype: bool

# DataFrame

A DataFrame represents a tabular, spreadsheet-like data structure containing an ordered
collection of columns, each of which can be a different value type (numeric,
string, boolean, etc.). The DataFrame has both a row and column index; it can be
thought of as a dict of Series (one for all sharing the same index)

In [9]:
data = {'Name': ['Royal', 'Abhinav', 'Akash', 'Sourav', 'Mudassir'],
 'dob': [1996, 2001, np.NaN, 1996,1997],
 'height': [192,165,170,187,185]}

df=pd.DataFrame(data) #Creating a dataframe object of a dictionary
df

Unnamed: 0,Name,dob,height
0,Royal,1996.0,192
1,Abhinav,2001.0,165
2,Akash,,170
3,Sourav,1996.0,187
4,Mudassir,1997.0,185


In [131]:
df=pd.DataFrame(data,columns=['dob','Name','Height'])#Changing order of columns 
df

Unnamed: 0,dob,Name,Height
0,1996,Royal,
1,2001,Abhinav,
2,1996,Akash,
3,1996,Sourav,
4,1997,Mudassir,


In [132]:
df=pd.DataFrame(data,index=[10,20,30,40,50]) # Customizing Index values according to your prefernce

df

Unnamed: 0,Name,dob,height
10,Royal,1996,192
20,Abhinav,2001,165
30,Akash,1996,170
40,Sourav,1996,187
50,Mudassir,1997,185


In [133]:
df['Name']#returns all the rows corresponding to that column
df.Name #returns all the rows corresponding to that column

10       Royal
20     Abhinav
30       Akash
40      Sourav
50    Mudassir
Name: Name, dtype: object

# df[column] works for any column name, but df.column only works when the column name is a valid Python variable name

In [134]:
new_df=pd.DataFrame(df,index=[10,20,30,40,50,60,70])
new_df

Unnamed: 0,Name,dob,height
10,Royal,1996.0,192.0
20,Abhinav,2001.0,165.0
30,Akash,1996.0,170.0
40,Sourav,1996.0,187.0
50,Mudassir,1997.0,185.0
60,,,
70,,,


In [135]:
df.head() #Returns first five rows as default
df.head(3) #Returns specified number of rows 

Unnamed: 0,Name,dob,height
10,Royal,1996,192
20,Abhinav,2001,165
30,Akash,1996,170


In [136]:
df.loc[30] #retrieing rows by index position

Name      Akash
dob        1996
height      170
Name: 30, dtype: object

In [137]:
df['height']=np.arange(5.) #Changing the value of Entire column

df

Unnamed: 0,Name,dob,height
10,Royal,1996,0.0
20,Abhinav,2001,1.0
30,Akash,1996,2.0
40,Sourav,1996,3.0
50,Mudassir,1997,4.0


In [138]:
a=[190,180,160,156,166]
df.height=a
# Make sure to check the length of the list you are assigning to the column
#Number of rows should match the number of values to be assigned
df

Unnamed: 0,Name,dob,height
10,Royal,1996,190
20,Abhinav,2001,180
30,Akash,1996,160
40,Sourav,1996,156
50,Mudassir,1997,166


# APPENDING A NEW COLUMN TO EXISTING DATAFRAME

In [139]:
sal=np.arange(70000,120000,10000)
df['salary']=pd.Series(sal) 
#If you wont specify the index pandas will simply add  the column and will not add the values
df

Unnamed: 0,Name,dob,height,salary
10,Royal,1996,190,
20,Abhinav,2001,180,
30,Akash,1996,160,
40,Sourav,1996,156,
50,Mudassir,1997,166,


In [140]:
df['salary']=pd.Series(sal,index=df.index) 
df

Unnamed: 0,Name,dob,height,salary
10,Royal,1996,190,70000
20,Abhinav,2001,180,80000
30,Akash,1996,160,90000
40,Sourav,1996,156,100000
50,Mudassir,1997,166,110000


In [141]:
df.T # To take a transpose of the data set

Unnamed: 0,10,20,30,40,50
Name,Royal,Abhinav,Akash,Sourav,Mudassir
dob,1996,2001,1996,1996,1997
height,190,180,160,156,166
salary,70000,80000,90000,100000,110000


# DELETING COLUMNS AND ROWS

In [145]:
df1=pd.DataFrame(df,index=df.index)


Unnamed: 0,Name,dob,height,salary
10,Royal,1996,190,70000
20,Abhinav,2001,180,80000
30,Akash,1996,160,90000
40,Sourav,1996,156,100000
50,Mudassir,1997,166,110000


In [None]:
del df1['salary']

In [150]:
df1

Unnamed: 0,Name,dob,height
10,Royal,1996,190
20,Abhinav,2001,180
30,Akash,1996,160
40,Sourav,1996,156
50,Mudassir,1997,166


In [152]:
dfT=pd.DataFrame(df1.T,index=df1.T.index) #Creating a new transposed data frame 
dfT

Unnamed: 0,10,20,30,40,50
Name,Royal,Abhinav,Akash,Sourav,Mudassir
dob,1996,2001,1996,1996,1997
height,190,180,160,156,166


In [160]:
dfT.columns

Int64Index([10, 20, 30, 40, 50], dtype='int64')

# To delete multiple columns at the same time in pandas, you could specify the column names as shown below. The option  inplace=True is needed if one wants the change affected column in the same dataframe. Otherwise remove it.

In [162]:
dfT.drop([10,20,30], axis=1, inplace=True) #Deleting multiple columns

In [163]:
dfT

Unnamed: 0,40,50
Name,Sourav,Mudassir
dob,1996,1997
height,156,166


In [164]:
df

Unnamed: 0,Name,dob,height
10,Royal,1996,190
20,Abhinav,2001,180
30,Akash,1996,160
40,Sourav,1996,156
50,Mudassir,1997,166


# Some Index methods and properties

**Method**           Description
<li>**append**       Concatenate with additional Index objects, producing a new Index
<li>**difference**   Compute set difference as an Index
<li>**intersection**  Compute set intersection
<li>**union**         Compute set union
<li>**isin**          Compute boolean array indicating whether each value is contained in the passed collection
<li>**delete**        Compute new Index with element at index i deleted
<li>**drop**          Compute new Index by deleting passed values
<li>**insert**        Compute new Index by inserting element at index i
<li>**is_monotonic**  Returns True if each element is greater than or equal to the previous element
<li>**is_unique**     Returns True if the Index has no duplicate values
<li>**unique**        Compute the array of unique values in the Index

In [None]:
df.index.name='Roll number' #Assigning name to index

In [170]:
df

Unnamed: 0_level_0,Name,dob,height
Roll number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
10,Royal,1996,190
20,Abhinav,2001,180
30,Akash,1996,160
40,Sourav,1996,156
50,Mudassir,1997,166


#  Indexing options with DataFrame

In [173]:
df.loc[30] #Selects single row or subset of rows from the DataFrame by label

Name      Akash
dob        1996
height      160
Name: 30, dtype: object

In [None]:
df.loc[:, val]     #Selects single column or subset of columns by label
df.loc[val1, val2] #Select both rows and columns by label


In [188]:
df.iloc[:30]   #Selects single row or subset of rows from the DataFrame by integer position

Unnamed: 0_level_0,Name,dob,height
Roll number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
10,Royal,1996,190
20,Abhinav,2001,180
30,Akash,1996,160
40,Sourav,1996,156
50,Mudassir,1997,166


# Function Application and Mapping

In [192]:
frame = pd.DataFrame(np.random.randn(4, 3), columns=list('ROY'),index=['a','b','c','d'])
frame

Unnamed: 0,R,O,Y
a,-1.431151,0.717391,-1.259815
b,0.637158,0.685574,-0.861992
c,0.648681,-0.022782,1.747191
d,0.641393,-1.499452,1.58323


In [193]:
np.abs(frame)

Unnamed: 0,R,O,Y
a,1.431151,0.717391,1.259815
b,0.637158,0.685574,0.861992
c,0.648681,0.022782,1.747191
d,0.641393,1.499452,1.58323


# TOCHECK NULL OR NAN VALUES

In [10]:
df.isnull().any() #returns boolean value

Name      False
dob        True
height    False
dtype: bool

In [11]:
df.isnull().sum() #returns the number of values

Name      0
dob       1
height    0
dtype: int64

# Replace any particular set of values in a column with any other value

In [None]:
df[‘column name’] = df[‘column name’].replace(‘A’, 5)
