# Python Basics for Data Science 
### IntroPython2.1 Python Basics-Operators  
### IntroPython2.2 Python Basics-Variables and Data Types
### IntroPython2.3 Python Basics-Data Structures
### IntroPython2.4 Python Basics-Functions and Methods
### IntroPython2.5 Python Basics-Create Our Own Function and Lambda
### IntroPython2.6 Python Basics-If Statement
### IntroPython2.7 Python Basics-Loops
### IntroPython2.8 Python Basics-Python Syntax Essentials and Best Practice
### IntroPython2.9 Python Basics-Import Statement and Important Built-in Modules
***

### Modules are divided into three groups:

1. `The modules of the Python Standard Library`: You can get these really easily because they come with Python3 by default. You simply have to type import and the name of the module – and from that point on you can use the given module in your code.

2. `More advanced and more specialized modules`:There are modules that are not part of the standard library. For these, you have to install new packages to your data server first. You will see that for data science we are using many of these “external” packages. (The ones you might have heard about are pandas, numpy, matplotlib, scikit-learn, etc.)

3. `Your own modules`: Yes, you can write new modules by yourself, too! (We’ll not cover this)

#### Anyway, import is a really powerful concept in Python – because with that you’ll be able to expand your toolset continuously and almost infinitely when you are dealing with different data science challenges.

## The most important Python Built-in Modules for Data Scientists

Okay, now that you get the concept, it’s time to see it in practice. As I have mentioned, there is a Python Standard Library with dozens of built-in modules. From those, I have picked the five most important modules for data analysts and scientists and let’s see the five built-in modules one by one. These are:

- random
- statistics
- math
- datetime
- csv

You can easily import any of them by using this syntax:

`import` [module_name]

eg. import random

Note: This will import the entire module with all items in it. You can import only a part of the module, too: 

`from` [module_name] `import` [item_name]. But let’s not complicate things with that yet.

### Python Built-in Module #1: `random`
Randomization is very important in data science. If you import the random module, you can generate random numbers by various rules.

In [1]:
# Let’s type this to your Jupyter Notebook first:
import random

In [2]:
# Then in a separate cell try out:
random.random()   # This will generate a random float between 0 and 1.

0.8067178448670768

In [3]:
random.randint(1,10)   # This will generate a random integer between 1 and 10.

5

### Python Built-in Module #2: `statistics`
There is a statistics built-in module which contains functions like: mean, median, mode, standard deviation, variance and more.

In [4]:
# Let’s try few of these:
import statistics

In [5]:
# Create a sample list:
a = [0, 1, 1, 3, 4, 9, 15]

In [6]:
statistics.mean(a)

4.714285714285714

In [7]:
statistics.median(a)

3

In [8]:
statistics.mode(a)

1

In [9]:
statistics.stdev(a)

5.437961803049794

In [10]:
statistics.variance(a)

29.571428571428566

### Python Built-in Module #3: `math`
There are a few functions that are under the umbrella of math rather than statistics. So there is a separate module for that. This contains factorial, power, and logarithmic functions, but also some trigonometry and constants.

In [11]:
import math

In [12]:
math.factorial(5)

120

In [13]:
math.pi

3.141592653589793

In [14]:
math.sqrt(5)

2.23606797749979

In [15]:
math.log(256, 2)

8.0

### Python Built-in Module #4: `datetime`
Do you plan to work for an online startup? Then you will probably encounter lot of data logs. And the heart of a data log is the datetime. Python3, by default, does not handle dates and times, but if you import the datetime module, you will get access to these functions, too.

In [16]:
import datetime

#### I think the implementation of the datetime module of Python is a bit over-complicated… at least, it’s not easy to use for beginners. For now let’s try these two functions to get a bit more familiar with it:

In [17]:
datetime.datetime.now()

datetime.datetime(2020, 11, 16, 20, 18, 47, 309572)

In [18]:
datetime.datetime.now().strftime("%F")

'2020-11-16'

### Python Built-in Module #5: `csv`
“csv” stands for “comma-separated values” and it’s one of the most common file formats for plain text data logs. So you definitely have to know how to open a .csv file in Python. There is a certain way to do that – just follow this example.

Let’s say you have this small .csv file.


In [19]:
%pwd

'C:\\Users\\chris\\PythonProgramming'

In [20]:
import os
os.chdir('C:\\Users\\chris\\PythonProgramming\\Data')

In [21]:
import csv

with open('AutoCollision.csv') as csvfile:
    my_csv_file = csv.reader(csvfile, delimiter=' ')
    for row in my_csv_file:
        print(row)

['AgeGroup,VehicleUse,ClaimSeverity,ClaimCount']
['17', 'to', '20,Pleasure,250.48,21']
['17', 'to', '20,DriveShort,274.78,40']
['17', 'to', '20,DriveLong,244.52,23']
['17', 'to', '20,Business,797.8,5']
['21', 'to', '24,Pleasure,213.71,63']
['21', 'to', '24,DriveShort,298.6,171']
['21', 'to', '24,DriveLong,298.13,92']
['21', 'to', '24,Business,362.23,44']
['25', 'to', '29,Pleasure,250.57,140']
['25', 'to', '29,DriveShort,248.56,343']
['25', 'to', '29,DriveLong,297.9,318']
['25', 'to', '29,Business,342.31,129']
['30', 'to', '34,Pleasure,229.09,123']
['30', 'to', '34,DriveShort,228.48,448']
['30', 'to', '34,DriveLong,293.87,361']
['30', 'to', '34,Business,367.46,169']
['35', 'to', '39,Pleasure,153.62,151']
['35', 'to', '39,DriveShort,201.67,479']
['35', 'to', '39,DriveLong,238.21,381']
['35', 'to', '39,Business,256.21,166']
['40', 'to', '49,Pleasure,208.59,245']
['40', 'to', '49,DriveShort,202.8,970']
['40', 'to', '49,DriveLong,236.06,719']
['40', 'to', '49,Business,352.49,304']
['50', 't

### More built-in modules
This is a good start but far from the whole list of the Python built-in modules. With other modules you can zip and unzip files, scrape websites, send emails, and do a lot of other exciting things. If you want to take a look at the whole list, check out the Python Standard Library (https://docs.python.org/3/library/) which is part of the original Python documentation.
And, as I mentioned, there are other Python libraries and packages that are not part of the standard library (like pandas, numpy, scipy, etc.) – I’ll write more about them soon!

### Syntax

Now that you have seen how import works, let’s talk briefly about the syntax!

#### Three things:

1. Usually, in Python scripts, we put all the import statements at the beginning of our script. Why is that? To see what modules our script relies on. Also, to make sure that the modules will be imported before we need to apply them. So keep this advice in mind: **python import statement and built-in modules - import at the beginning of the script**

2. In this lecture note, we applied the functions of the modules using this syntax: `module_name.function_name`(parameters)

```markdown
Eg. statistics.median(a)

or, csv.reader(csvfile, delimiter=';'). This is logical: before you apply a given function, you have to tell Python in which module to find it.
```

In some cases there are even more complicated relationships – like functions of classes in a module (eg. `datetime.datetime.now`()) but let’s not confuse yourself with that for now. 

My suggestion is to make a list of your favorite modules and functions and learn how they work; if you need a new one, check out the original Python documentation and add the new module plus its function to your list.

3. When you import a module (or a package) you can rename it using the as keyword:
If you type:

```markdown
import statistics as stat
```

You have to refer to your module as `stat`. Eg. `stat.median`(a) and not as statistics.median(a). Conventionally, we are using two very well-known data science related Python libraries imported with their shortened name: 

`numpy` (import numpy as np) and `pandas` (import pandas as pd).

Here is a summary of terms:

- **Function**: it’s a block of code that you can (re-)use by calling it with a keyword. 

Eg. print() is a function.

- **Module**: it’s a `.py` file that contains a list of functions (it can also contain variables and classes). 

Eg. in statistics.mean(a), mean is a function that is found in the statistics module.

- **Package**: it’s a collection of Python modules. 

Eg. numpy.random.randint(2, size=10) randint() is a function in the random module of the numpy package.

- **Library**: it’s a more general term for a collection of Python codes.

#### Note: The course materials are developed mainly based on personal experience and contributions from the Python learning community
Referred book: Learning Python, 5ht Edition by Mark Lutz