# 1. Introduction to Python for Data Science

We are living in a world that’s drowning in data. As practicionners used to say, **DATA** is the new fuel, the new electricity. 


But it will not make much of sense to limitate ourselves to having the data. Because in fact, data has always existed.  **From Johaness Kepler** back in centuries before J.C in Prague when he tried to understand the movement of planets by recording their motion around the sun, all the way up to a medical doctor in Kenya who records information about a patient before applying any treatment.


Data has always at the center of (Applied) Science. But what makes it such an attraction nowadays are the tools we use to handle the actual data.



As said previously, Python is one of them. The purpose of this section is to give the reader a glimpse about how to get the best of Python to efficiently carry out any data science project.


Over the last couple of decades, Python has emerged as a **first-class tool** for scientific computing tasks, including the analysis and visualization of large datasets. 


Though it was not specifically designed with data analysis or scientific computing in mind, it has grown as the language or tool of choice when handling any data science project. 

Mainly because of its wide community of users as well as its large and active ecosystem of third-party packages such as: 

1. **$\Large NumPy \quad$**       for manipulation of homogeneous array-based data
2. **$\Large Pandas \quad$** for manipulation of heterogeneous and labeled data, 
3. **$\Large SciPy \quad$** for common scientific computing tasks,
4. **$\Large Matplotlib \quad$** for publication-quality visualizations, 
5. **$\Large Scikit-Learn \quad$** for machine learning
6. **$\Large IPython \quad$** for interactive execution and sharing of code

are the reasons why data scientists stick to it.

With the recent advent of Machine Learning/Deep Learning and their spectacular success, giant companies like Google and Facebook have put together some amazing educational resources to basically allow any practicioner to conduct any data science project.


Using the code-light philosophy to express the use of the concept-heavy thing that Machine Learning examplifies. The following libraries haev been released:

1. **Tensorflow** (Deep Learning LIbrary written in Python powered by Google)
2. **Pytorch** (Deep Learning LIbrary written in Python powered by Facebook)

MOst of the projects released by those companies make an extensive use of thoSe libraries and they are **FREE and OPEN SOURCE!!!**

# 2. The basics of Python

### Installing Python

Python is easily downlodable from **python.org.**

But if you find the process a bit hectic, we strongly recommend installing **the Anaconda distribution** which already includes most of the libraries needed for data science.

### Launching Python

There are basically two ways you can launch Python. Either from the terminal by typing:

    $ python
    
Or by opening it through a Jupyter Notebook by typing from the terminal:

    $ jupyter notebook

### Data Structures

In the core Python language, some features are more important for data analysis than others. In this chapter, you’ll look at the most essential of them such as list, strings,  string functions, data structures, list comprehension, counters.

### Values and types

A value is one of the basic things a program works with, like a letter or a number.
These values belong to different types:


In [None]:
10  #is an integer
"apple" #is a string
7.5  #is a floating point


### ==> Variables

One of the most powerful features of Python is the ability to
manipulate variables. A variable is a name that refers to a value.
An assignment statement creates new variables and gives them values.

In [None]:
a = 4
b  = " data science is cool"
pi = 3.1415926535897931
print(a)
print(b)
print(pi)

In [None]:
type (a) , type(b) , type(pi)

### ==> List

A **list** is a sequence of values. In a string, the values in a list can be of any type. The values in list are called elements or sometimes items.

In [None]:
d = [ 2 , 9,  14, 12.5]
f = ["orange" , "apple", "banana" , "tomato"]
print (d , f)


To access elements in a list, we just use their index (starting from 0)

In [None]:
#the type of the object
type(f)

In [None]:
#the length of the list
len(d)

In [None]:
print ( d[0] , f[2])

Lists are **mutable** which means you can change an element as compared to **tuples** which share the same properties but are not mutable

In [None]:
f[1] ="pineapple"
print(f)

In [None]:
g = ("R" , "Java" , "C++")
print(g)

In [None]:
type(g)

In [None]:
g[1] = "perl"

### ==> List comprehension

The most common way to traverse the elements of a list is with a for loop. The
syntax is the same as for strings:

In [None]:
for i in f: 
    print(i)

### ==> Functions

A function is a named sequence of statements that
performs a computation. When you define a function, you specify the name and
the sequence of statements. Later, you can **“call”** the function by name.

In [None]:
def add (x , y):
    z = x + y
    return z

In [None]:
add( 5 , 6)

In [None]:
def add_two(s):
    t = s + '_two'
    return t
    

In [None]:
add_two('fourty')

In [None]:
add_two('fifty')

 ## Modules and Packages

Modules refer to a file containing Python statements and definitions.

• Modules allow us to write code, and separate it out from other code into a different file.

• We can use the import statement to include the code of another file

To import a module in Python we type the following in the Python prompt.


$$import \quad ”module \_ name”$$

### ==> Examples

In [None]:
import math #themath module
import re #the regular expression module
import random #the random module


However, Some modules might be relatively long to type. The user can choose to import a module and
rename it , to save on typing. We can import a module by renaming it as follows.


$$import \quad ”module” \quad as \quad ”new \quad name”$$

### ==> Examples

In [None]:
import numpy as np
import pandas as pd
import sklearn as sk

## Some of the librairies

### ==> Numpy

Usually, the numpy library is meant to deal with **arrays** and  **matrices operations**

In [None]:
import numpy as np

In [None]:
# A vector a 10 evenly spaced numbers of step 1
n1 = np.arange(10)  
n1

In [None]:
# A vector of 6 evenly spaced numbers between 0 and 50
n2 = np.linspace(0,50 , 6)
n2

In [None]:
# A vector of 10 random integers between 25 and 45
n3 = np.random.randint(25,45,10)
n3

In [None]:
# A vector of 6 ones
np.ones(6)

In [None]:
# A vector of 4 zeros
np.zeros(4)

To access an array, we use the same procedure as for the lists

In [None]:
n4 = np.random.rand(5)
n4

In [None]:
#the first element
n4[0]

In [None]:
#the last element
n4[-1]

To reshape an array to a matrix

In [None]:
n5 = np.random.randint(0,50,12)
n5

In [None]:
n5.reshape(4,3)