# **Getting Started with Python!**

## NumPy

In the previous notebook we learnt the basics of Python and why it's so popular for application in AI. We also learnt about the different data types available in Python, what modules and packages are along with different control options. Now let's get started with **NumPy** (short for Numerical'Num' Python'Py')  one of the most popular packages available in Python for performing scientfic calculcations especially those related to arrays. Let's get started! 

## Arrays

NumPy arrays in python are basically grids of values of the same data type. Each element of a NumPy array has a non-negative index associated with it. When dealing with arrays, we come across a useful term called **shape** which gives the size of the array along each dimension. So why do we use NumPy array instead of standard Python lists? In Data Science calculations, NumPy arrays take up smaller memory consumption and better runtime behavior this is important as commonly, we deal with a large amount of data and the difference in processing Python lists and NumPy arrays will add up to a significant difference. NumPy also contains multi-dimensional array and matrix data structures and can perform several mathematical operations on arrays such as trigonometric, statistical, and algebraic routines. 

In [None]:
import numpy as np #importing numpy package 
a = np.array([1,2,3]) #initializing an array with the elements given in the brackets
print (a)

Let's check the datatype for the array 'a'

In [None]:
print (type(a)) #write your code for checking the datatype here

In [None]:
#Creating an array with more than one dimension
b = np.array([[1, 2], [3, 4]]) 
print (b)

In [None]:
#let's check the shape of the arrays created so far
print("Shape of array 'a' is:",(a.shape))
print("Shape of array 'b' is:",(b.shape))

In [None]:
#accessing elements of the array using the index
print(a[0], a[1], a[2]) 
print(b[0][0]) #printing the first row first, first column element

In [None]:
#another way of printing a particular element of an array
print(b[0,1])

In [None]:
#using the arange function to create an array
c=np.arange(4) #prints the first four elements as 0 onwards by default
print (c)
d=np.arange(1,7)#specifying start point and end point
print (d)

In [None]:
# Create the following array with given shape (3, 4). Name the array 'new'
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
new = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print (new)

In [None]:
# Slicing is being used to pull out the subarray consisting of the first 2 rows
# and columns 1 and 2; new_sub is the following array of shape (2, 2):

new_sub = new[:2, 1:3]
print(new_sub)

**Integer array indexing** 

When you index into numpy arrays using slicing, the resulting array will always be a subarray of the original array. In contrast, integer array indexing allows you to construct arbitrary arrays using the data from another array. Do you recall slicing? We have gone through this in the previous notebook! Remember, indexes in Python always starts from 0 for the first element.

Here is an example:

In [None]:
a = np.array([[1,2], [3, 4], [5, 6]])

#Given the following, this is output of the above, can you see how you would get [1 4 5]? 
#For the value of '1' it is at position 0,0 which is the first 'box' and the first 'item' in the 'box'
#For the vlaue of '4' it is at the position of 1,1 which is the second 'box' and the second 'item' in the 'box'
#Can you now figure out how you would get the value of '5'?

print(np.array([a[0, 0], a[1, 1], a[2, 0]]))  # Prints "[1 4 5]"


#The above can also be written as:
# a[0,0] '1'
# a[1,1] '4'
# a[2,0] '5'
# This is an Example of integer array indexing. Look closely can you tell the difference between this statement and the last?
b=(a[[0, 1, 2], [0, 1, 0]]) # Do you now see how array indexing is done and how it relates to the previous statements?


print(b)  # Prints "[1 4 5]"
print(b.shape) #Shape is 3

In [None]:
# The above example of integer array indexing is equivalent to this:
print(np.array([a[0, 0], a[1, 1], a[2, 0]]))  # Prints "[1 4 5]"

# When using integer array indexing, you can reuse the same
# element from the source array:
print(a[[0, 0], [1, 1]])  # Prints "[2 2]"

# Equivalent to the previous integer array indexing example
print(np.array([a[0, 1], a[0, 1]]))  # Prints "[2 2]"

We can also use integer indexing to mutate specific elements of an array

In [None]:
# Create an array of indices
b = np.array([1, 0, 1])
print(b)

# Select one element from each row of a using the indices in b
print(a[np.arange(3), b])

In [None]:
# Mutate one element from each row of a using the indices in b
a[np.arange(3), b] += 10

print(a)

## Datatypes in NumPy Arrays

In [None]:
roll_no = np.array([23, 46])   # Let numpy choose the datatype
print(roll_no.dtype)         # Prints "int64"

marks = np.array([65.5, 88.0])   # Let numpy choose the datatype
print(marks.dtype)             # Prints "float64"

In [None]:
x = np.array([1, 2], dtype=np.int64)   # Force a particular datatype
print(x.dtype)                         # Prints "int64"

**Array Mathematics**

Basic mathematical operations are available on arrays in an elementwise manner. This can be done using either operators like (+,-,* etc.) or as built-in functions within the package itself.

In [None]:
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)

# Elementwise sum; both produce the array
# [[ 6.0  8.0]
#  [10.0 12.0]]
print(x + y)
print(np.add(x, y))

In [None]:
#write a code to create two arrays X & Y with the following elements and print their difference
# X = [[13.0 7.0]
#     [4.0 47.0]]
# Y = [[16.0 14.0] 
#     [1.0 23.0]]

In [None]:
X = np.array([[13.0,7.0],[4.0,47.0]], dtype=np.float64)
Y = np.array([[16.0,14.0],[1.0,23.0]], dtype=np.float64)
print (X-Y)
print(np.subtract(X, Y))

In [None]:
#to perform elementwise multiplication
print(x * y)
print(np.multiply(x, y))

Note that this is different from performing matrix multiplication. The * operator is only used for elementwise multiplication.

In [None]:
#to perform elementwise division
print(x / y)
print(np.divide(x, y))

Sorting NumPy Arrays

In [None]:
a = np.array([1, 2, 3, 4, 5, 2, 1])
print(a)

In [None]:
np.sort(a) #function for sorting elements in ascending order
print (np.sort(a))

In [None]:
np.flip(np.sort(a)) #function for sorting elements in descending order
print (np.flip(np.sort(a)))

In [None]:
np.flip(np.sort(a))[:2]  #slicing and sorting
print (np.flip(np.sort(a))[:2])

## Manipulating NumPy Arrays

The function reshape() allows manipulation of shape of array without altering its elements. The syntax for this function is illustrated below

In [None]:
x=np.arange(12) #creating 12*1 array
print(x)

In [None]:
y=x.reshape(6,2) #changing shape from 12*1 to 6*2 array
print(y)

Just like concatenation of lists or strings we can concatenate arrays as well. In the following example we concatenate three one-dimensional arrays to one array. The elements of the second array are appended to the first array. After this the elements of the third array are appended.

In [None]:
x = np.array([3,12])
y = np.array([12,4,7])
z = np.array([2,6,8])
print(x)
print(y)
print(z)
c = np.concatenate((x,y,z)) #function to perform concatenation
print(c)

If we are concatenating multidimensional arrays, we can concatenate the arrays according to axis. Arrays must have the same shape to be concatenated with concatenate(). In the case of multidimensional arrays, we can arrange them according to the axis. The default value is axis = 0:

In [None]:
x = np.array([[2,4],[7,9]])
print(x)
x=x.reshape(1,4)
y = np.array([10,12,14,16])
print(y)
y=y.reshape(1,4)
c = np.concatenate((x,y),axis=1)
print(c)

**Pro-tip**: To have a list of useful functions and commands always at your fingertips you can save this [cheatsheet](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf) here!

# **Pandas**

Pandas is built on top of NumPy and is one of the most popular open-source packages available in Python. It's main utility lies in providing a large number of functions for handling real world data.

You would not need to fully understand the capabilities of Pandas at present, the examples given are just to demonstrate the some of the useful workings of it. In the later sessions we will go through in full detail about how we would use Pandas in an ML workflow

However, we would suggest going forward, to pay special attention to the syntax and the flow and order of the codes as they are generally always being used in similar fashion.

## Importing Pandas

In [None]:
#Do you recognize the keywords here? Both import and the library name, such syntax will be common going forward
import pandas as pd

 **Data Frame**

A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns.

DataFrames accepts many different kinds of inputs, if you would like to learn more, feel free to search online or go to [pandas'](https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html) documentation. <br>For the purpose of this example, we shall demonstrate a useful way you can use DataFrames. <br><br>Let's say we are interested in the weather, specifically we are interested in data from [Open Government Data](https://data.gov.in/keywords/weather) and have obtained a file called (JaipurFinalCleanData.csv) <br><br>A Comma-Separated Value or csv file, as the name implies contains a set of data separated by commas, files like these would be familiar to users of Spreadsheet software like Excel or Google Sheet.
<br><br>In this example, we will open a csv file in Python, and load it into a DataFrame.
<br><br> **Bonus Exercise:** Try loading in the file, and follow the rest of the exercise. Use your friend, the search engine and find a way to load your file into Colab!

In [None]:
#If doing locally using Anaconda for example, the file(csv) should be in same the folder as this notebook.
#This example is expected to have errors without the file being uploaded



#First, create a variable we will call the DataFrame from
dataframe = pd.read_csv("JaipurFinalCleanData.csv")
print(dataframe.head())

the .head() function by default shows the first 5 rows of a DataFrame, it also helps you to check if the csv file have loaded in correctly.

In [None]:
#You may also add arguments into the function to display more data as so.
print(dataframe.head(10))

In [None]:
#Another useful thing you could do is to check all the datatypes within the dataframe 
dataframe.dtypes

Have you noticed? Using print() on the DataFrame shows many values, but how are are you supposed to make sense of all of them? That is where we might want to do Data Visualization and use graphs, plots and charts to better understand what we are looking at and what we might be looking for. In the following Notebook we shall be looking into a useful and common library, **matplotlib** which helps us with this.