# Python Basics for Data Science 
***
### IntroPython2.1 Python Basics-Operators  
### IntroPython2.2 Python Basics-Variables, Data Types, and Data Type Conversion
### IntroPython2.3 Python Basics-Data Structures
### IntroPython2.4 Python Basics-Built-in Functions and Methods
### IntroPython2.5 Python Basics-Create Our Own Function and Lambda
### IntroPython2.6 Python Basics-If Statement
### IntroPython2.7 Python Basics-Loops
### IntroPython2.8 Python Basics-Import Statement and Important Built-in Modules, Syntax Essentials and Best Practices
***

## Important Built-in Modules, Syntax Essentials and Best Practice - Table of Contents
### 1. Important Built-in Modules
### 2. The 3 Major Things to Keep in Mind about Python Syntax
### 3. The 6 Other Python Best Practices for Nicer Formatting

I’ve put together the Python syntax essentials we should keep in mind as a data science/data analytics professional. And I added some formatting best practices as well, to help us keep our code nice and clean.

## 1. Important Built-in Modules

### Modules are divided into three groups:

1. `The modules of the Python Standard Library`: You can get these really easily because they come with Python3 by default. You simply have to type import and the name of the module – and from that point on you can use the given module in your code.

2. `More advanced and more specialized modules`:There are modules that are not part of the standard library. For these, you have to install new packages to your data server first. You will see that for data science we are using many of these “external” packages. (The ones you might have heard about are pandas, numpy, matplotlib, scikit-learn, etc.)

3. `Your own modules`: Yes, you can write new modules by yourself, too! (We’ll not cover this)

#### Anyway, `import` is a really powerful concept in Python – because with that you’ll be able to expand your toolset continuously and almost infinitely when you are dealing with different data science challenges.

## The most important Python Built-in Modules for Data Scientists

Okay, now that you get the concept, it’s time to see it in practice. As I have mentioned, there is a Python Standard Library with dozens of built-in modules. From those, I have picked the five most important modules for data analysts and scientists and let’s see the five built-in modules one by one. These are:

- random
- statistics
- math
- datetime
- csv

You can easily import any of them by using this syntax:

`import` [module_name]

eg. import random

Note: This will import the entire module with all items in it. You can import only a part of the module, too: 

`from` [module_name] `import` [item_name]. But let’s not complicate things with that yet.

### Python Built-in Module #1: `random`
Randomization is very important in data science. If you import the random module, you can generate random numbers by various rules.

In [36]:
# Let’s type this to your Jupyter Notebook first:
import random

In [37]:
# Then in a separate cell try out:
random.random()   # This will generate a random float between 0 and 1.

0.9756486372395423

In [38]:
random.randint(1,10)   # This will generate a random integer between 1 and 10.

7

### Python Built-in Module #2: `statistics`
There is a statistics built-in module which contains functions like: mean, median, mode, standard deviation, variance and more.

In [39]:
# Let’s try few of these:
import statistics

In [40]:
# Create a sample list:
a = [0, 1, 1, 3, 4, 9, 15]

In [41]:
statistics.mean(a)

4.714285714285714

In [42]:
statistics.median(a)

3

In [43]:
statistics.mode(a)

1

In [44]:
statistics.stdev(a)

5.437961803049794

In [45]:
statistics.variance(a)

29.571428571428566

### Python Built-in Module #3: `math`
There are a few functions that are under the umbrella of math rather than statistics. So there is a separate module for that. This contains factorial, power, and logarithmic functions, but also some trigonometry and constants.

In [46]:
import math

In [47]:
math.factorial(5)

120

In [48]:
math.pi

3.141592653589793

In [49]:
math.sqrt(5)

2.23606797749979

In [50]:
math.log(256, 2)

8.0

### Python Built-in Module #4: `datetime`
Do you plan to work for an online startup? Then you will probably encounter lot of data logs. And the heart of a data log is the datetime. Python3, by default, does not handle dates and times, but if you import the datetime module, you will get access to these functions, too.

In [51]:
import datetime

#### I think the implementation of the datetime module of Python is a bit over-complicated… at least, it’s not easy to use for beginners. For now let’s try these two functions to get a bit more familiar with it:

In [52]:
datetime.datetime.now()

datetime.datetime(2023, 1, 30, 19, 58, 36, 429740)

In [53]:
datetime.datetime.now().strftime("%F")

'2023-01-30'

### Python Built-in Module #5: `csv`
“csv” stands for “comma-separated values” and it’s one of the most common file formats for plain text data logs. So you definitely have to know how to open a .csv file in Python. There is a certain way to do that – just follow this example.

Let’s say you have this small .csv file.


In [54]:
%pwd

'C:\\Users\\yumei\\MSCA37014PythonForAnalyticsSummer2022\\Data'

In [57]:
import os
os.chdir(r'C:\Users\yumei\CSP Workshop 2023')

In [58]:
%pwd

'C:\\Users\\yumei\\CSP Workshop 2023'

In [60]:
os.chdir(r'C:\Users\yumei\CSP Workshop 2023\Data')

In [61]:
%pwd

'C:\\Users\\yumei\\CSP Workshop 2023\\Data'

In [66]:
import csv

with open(r'C:\Users\yumei\CSP Workshop 2023\Data\AutoCollision.csv') as csvfile:
    my_csv_file = csv.reader(csvfile, delimiter=' ')
    for row in my_csv_file:
        print(row)

['AgeGroup,VehicleUse,ClaimSeverity,ClaimCount']
['17', 'to', '20,Pleasure,250.48,21']
['17', 'to', '20,DriveShort,274.78,40']
['17', 'to', '20,DriveLong,244.52,23']
['17', 'to', '20,Business,797.8,5']
['21', 'to', '24,Pleasure,213.71,63']
['21', 'to', '24,DriveShort,298.6,171']
['21', 'to', '24,DriveLong,298.13,92']
['21', 'to', '24,Business,362.23,44']
['25', 'to', '29,Pleasure,250.57,140']
['25', 'to', '29,DriveShort,248.56,343']
['25', 'to', '29,DriveLong,297.9,318']
['25', 'to', '29,Business,342.31,129']
['30', 'to', '34,Pleasure,229.09,123']
['30', 'to', '34,DriveShort,228.48,448']
['30', 'to', '34,DriveLong,293.87,361']
['30', 'to', '34,Business,367.46,169']
['35', 'to', '39,Pleasure,153.62,151']
['35', 'to', '39,DriveShort,201.67,479']
['35', 'to', '39,DriveLong,238.21,381']
['35', 'to', '39,Business,256.21,166']
['40', 'to', '49,Pleasure,208.59,245']
['40', 'to', '49,DriveShort,202.8,970']
['40', 'to', '49,DriveLong,236.06,719']
['40', 'to', '49,Business,352.49,304']
['50', 't

### More built-in modules
This is a good start but far from the whole list of the Python built-in modules. With other modules you can zip and unzip files, scrape websites, send emails, and do a lot of other exciting things. If you want to take a look at the whole list, check out the Python Standard Library (https://docs.python.org/3/library/) which is part of the original Python documentation.
And, as I mentioned, there are other Python libraries and packages that are not part of the standard library (like pandas, numpy, scipy, etc.) – I’ll write more about them soon!

1. Usually, in Python scripts, we put all the import statements at the beginning of our script. Why is that? To see what modules our script relies on. Also, to make sure that the modules will be imported before we need to apply them. So keep this advice in mind: **python import statement and built-in modules - import at the beginning of the script**

2. In this lecture note, we applied the functions of the modules using this syntax: `module_name.function_name`(parameters)

```markdown
Eg. statistics.median(a)

or, csv.reader(csvfile, delimiter=';'). This is logical: before you apply a given function, you have to tell Python in which module to find it.
```

In some cases there are even more complicated relationships – like functions of classes in a module (eg. `datetime.datetime.now`()) but let’s not confuse yourself with that for now. 

My suggestion is to make a list of your favorite modules and functions and learn how they work; if you need a new one, check out the original Python documentation and add the new module plus its function to your list.

3. When you import a module (or a package) you can rename it using the as keyword:
If you type:

```markdown
import statistics as stat
```

You have to refer to your module as `stat`. Eg. `stat.median`(a) and not as statistics.median(a). Conventionally, we are using two very well-known data science related Python libraries imported with their shortened name: 

`numpy` (import numpy as np) and `pandas` (import pandas as pd).

Here is a summary of terms:

- **Function**: it’s a block of code that you can (re-)use by calling it with a keyword. 

Eg. print() is a function.

- **Module**: it’s a `.py` file that contains a list of functions (it can also contain variables and classes). 

Eg. in statistics.mean(a), mean is a function that is found in the statistics module.

- **Package**: it’s a collection of Python modules. 

Eg. numpy.random.randint(2, size=10) randint() is a function in the random module of the numpy package.

- **Library**: it’s a more general term for a collection of Python codes.

## 2. The 3 Major Things to Keep in Mind about Python Syntax
### 1) Line Breaks Matter
In Python, line breaks matter. Which means that in 99% of cases, if you put a line break where you shouldn’t put one, you will get an error message.

So here’s **Python syntax rule #1: one statement per line**.

**There are some exceptions, though.**

Expressions

- in parentheses (eg. functions and methods),
- in bracket frames (eg. lists),
- and in curly braces (eg. directories)

can actually be split into more lines. This is called implicit line joining and it is a great help when working with bigger data structures.

In [67]:
# Example: 
my_movies = ['How I Met Your Mother', 'Friends', 'Silicon Valley','Family Guy',
             'South Park']
my_movies

['How I Met Your Mother',
 'Friends',
 'Silicon Valley',
 'Family Guy',
 'South Park']

#### Additionally, 
- you can also break any expression into more than one line if you use a backslash `\` at the end of the line. 
- And you can do the opposite, too: inserting more than one statement into one line using semicolons (`;`) between the statements. 
- However, these two methods are not too common, and I’d recommend using them only when necessary. (E.g. with really long, 80+ character long statements.)

In [68]:
print("Hello"); print("Hi"); print("Bye")

Hello
Hi
Bye


In [69]:
print\
("He\
llo")

Hello


### 2) Indentations Matter

Why do we need indentations? Somehow you have to indicate which code blocks belong together. 

e.g. what is the beginning and the end of an **if** statement or a **for** loop. In other languages, where you don’t have to use indentations, you have to use something else for that: 

e.g. in JavaScript you have to use extra brackets to frame your code blocks; in bash you have to use extra keywords. In Python, you have to use indentations – which in my opinion is the most elegant way to solve this issue.

So we have **Python syntax rule #2: make sure that you use indentations correctly and consistently.**

Note: We talked about the exact syntax rules governing for loops and if statements

### 3) Case Sensitivity

Python is case sensitive. It makes a difference whether you type **and** (correct) or **AND** (won’t work). **As a rule of thumb, learn that most of the Python keywords have to be written with lowercase letters**. 

The most commonly used exceptions I have to mention here (because I see many beginners have trouble with it) are the Boolean values. These are correctly spelled as: True and False. (Not TRUE, nor true.)

There’s **Python syntax rule #3: Python is case sensitive.**

## 2. The 6 Other Python Best Practices for Nicer Formatting
A few (non-mandatory but highly recommended) Python best practices that will make your code much nicer, more readable and more reusable.

### Python Best Practice #1: Use Comments
You can add comments to your Python code. Simply use the **#** character. Everything that comes after the # won’t be executed.

In [70]:
# This is a comment before my for loop.
for i in range(0, 30, 2):
    print(i)

0
2
4
6
8
10
12
14
16
18
20
22
24
26
28


### Python Best Practice #2: Variable Names
- Conventionally, **variable names should be written with lower letters, and the words in them separated by `_` characters**. Also, generally I do not recommend using one letter variable names in your code. Using meaningful and easy-to-distinguish variable names helps other programmers a lot when they want to understand your code. **more flexible, reusable and understandable**. 

```markdown
my_meaningful_variable = 100
```

- **Do not use Python keywords to name your variables** because there might be name conflicts between your variable and the reserved keyword.

- Everything in Python is case sensitive. e.g.: "character" is different from "Character"

- shouldn't start with `numbers`

- shouldn't start with `special symbols`

- use `_` to separate different words (vs. RStudio: use `.` to separate commonly)

- use `#` to make comments

### Python Best Practice #3: Use Blank Lines
If you want to separate code blocks visually (e.g. when you have a 100 line Python script in which you have 10-12 blocks that belong together) you can use **blank lines**. Even multiple blank lines. It won’t affect the result of your script.

In [71]:
down = 0
up = 100

for i in range(1,10):
    guessed_age = int((up + down) / 2)
    answer = input('Are you ' + str(guessed_age) + ' years old?')    
    if answer == 'correct':
        print("Nice")
        break
    elif answer == 'less':
        up = guessed_age
    elif answer == 'more':
        down = guessed_age
    else:
        print('wrong answer')

Are you 50 years old?more
Are you 75 years old?more
Are you 87 years old?less


KeyboardInterrupt: Interrupted by user

### Python Best Practice #4: Use White Spaces Around Operators and Assignments

For cleaner code it’s worth using spaces around your `=` signs and your mathematical and comparison operators `(>, <, +, -, etc.)`. If you don’t use white spaces, your code will run anyway, but again: the cleaner the code, the easier to read it, the easier to reuse it.

In [None]:
# Example:
number_x =10
number_y = 100
number_mult = number_x * number_y 
number_mult

### Python Best Practice #5: Max Line Length Should Be 79 Characters
If you reach 79 characters in a line, it’s recommended to break your code into more lines. Use the above-mentioned backslash `\ `character. Using the `\` at the end of the line, Python will ignore the line break and will read your code as if it were one line.

### Python Best Practice #6: Stay Consistent
And one of the most important rules: always stay consistent! Even if you follow the above rules, in specific situations you’ll have to create your own. Either way: make sure you are using these rules consistently. Ideally, you have to create Python scripts that you can open 6 months later without any trouble understanding them. If you randomly change your formatting rules and naming conventions, you’ll create an unnecessary headache for your future self. So stay consistent!

#### Note: The course materials are developed mainly based on personal experience and contributions from the Python learning community
Referred Books: 
- Learning Python, 5th Edition by Mark Lutz
- Python Data Science Handbook, Jake, VanderPlas
- Python for Data Analysis, Wes McKinney    

Copyright ©2023 Mei Najim. All rights reserved. 