# <font color=green> PYTHON FOR DATA SCIENCE - NUMPY
---

# <font color=green> 1. INTRODUCTION TO PYTHON
---

# 1.1 Introduction

> Python is a high-level programming language that supports multiple programming paradigms. It is an *open source* project and since its inception in 1991, it has become one of the most popular interpreted programming languages.
>
> In recent years Python has developed an active community of scientific processing and data analysis and has been standing out as one of the most relevant languages when it comes to data science and machine learning, both in the academic environment and in the market.

# 1.2 Installation and development environment

### Local Installation

### https://www.python.org/downloads/
### or
### https://www.anaconda.com/distribution/

### Google Colaboratory

### https://colab.research.google.com

### Checking version

In [None]:
#Other method !python --version
!python -V 

Python 3.7.13


# 1.3 Working with Numpy arrays

In [None]:
import numpy as np

In [None]:
km = np.loadtxt('cars-km.txt')

In [None]:
years = np.loadtxt('cars-years.txt', dtype = int)

### Getting the average mileage per year

In [None]:
km_average = km / (years - 2019)

  """Entry point for launching an IPython kernel.


In [None]:
km_average

array([-2.77562500e+03, -2.04000000e+02, -1.28010345e+03,             nan,
       -1.98130769e+03, -1.53257143e+03,             nan, -7.75990000e+03,
       -1.10218889e+04, -4.74725000e+03, -7.56411765e+02, -6.71000000e+02,
       -4.98738889e+03,             nan, -4.14570000e+03, -3.85356667e+04,
       -6.63557143e+03,             nan, -1.23620000e+04, -7.58650000e+03,
       -5.95252941e+03,             nan, -3.92316000e+03, -3.67710714e+03,
                   nan,             nan, -1.93166667e+03,             nan,
       -3.46164706e+03, -3.37075000e+03, -1.37104545e+03, -2.22216667e+03,
       -1.77200000e+04, -1.20742857e+03, -1.81368000e+04, -1.83229167e+03,
                   nan,             nan, -5.52600000e+02,             nan,
       -1.55691667e+04, -2.54762500e+03,             nan, -5.07658824e+03,
                   nan, -5.73823529e+02, -4.66300000e+03, -1.33055556e+02,
                   nan, -5.02181250e+03, -8.55540000e+03, -3.88430769e+03,
       -3.56400000e+03, -

In [None]:
type(km_average)

numpy.ndarray

# <font color=green> 2. BASIC CHARACTERISTICS OF THE LANGUAGE
---

# 2.1 Math operations

### Arithmetic operators: $+$, $-$, $*$, $/$, $**$, $\%$, $//$

### Addition ($+$)

In [None]:
2 + 2

4

### Subtraction ($-$)

In [None]:
2-2

0

### Multiplication ($*$)

In [None]:
3 * 3

9

### Division ($/$) e ($//$)
The division operation always returns a floating point number

In [None]:
10 / 3

3.3333333333333335

In [None]:
10 // 3

3

### Exponentiation ($**$)

In [None]:
 2 ** 3

8

### Rest of Division ($\%$)

In [None]:
10 % 3

1

In [None]:
10 % 2

0

### Mathematical expressions

In [None]:
5 * 2 + 3 * 2

16

In [None]:
(5 * 2) + (3 * 2)

16

In [None]:
5 * (2 + 3) * 2

50

### Demystifying expressions

In [None]:
10 % 2 + 3 // 10

0

In [None]:
5 * (2 + 3) / 2

12.5

In [None]:
2 ** 3 * 4

32

### The variable _

In interactive mode, the last printed result is assigned to the variable _

In [None]:
5 * 2

10

In [None]:
_ + 3 * 2

16

In [None]:
_ / 8

2.0

# 2.2 Variables 

### Variable names

- Variable names can start with letters (a - z, A - Z) or the character *underscore* (_):

    > Height
    >
    > _weight
    
- The rest of the name can contain letters, numbers and the character "_":

    > name_of_variable
    >
    > _value
    >
    > day_11_28_
    

- The names are *case sensitive*:

    > Name_Of_Variable $\ne$ name_of_variable $\ne$ NAME_OF_VARIABLE
    
### <font color=red>Note:
- There are some reserved words in the language that cannot be used as variable names.:

| |List of reserved words in Python| |
|:-------------:|:------------:|:-------------:|
| and           | as           | not           | 
| assert        | finally      | or            | 
| break         | for          | pass          | 
| class         | from         | nonlocal      | 
| continue      | global       | raise         | 
| def           | if           | return        | 
| del           | import       | try           | 
| elif          | in           | while         | 
| else          | is           | with          | 
| except        | lambda       | yield         | 
| False         | True         | None          | 

### Declaration of variables

### Assignment operators: $=$, $+=$, $-=$, $*=$, $/=$, $**=$, $\%=$, $//=$

In [None]:
 a = 2
 a

2

In [None]:
a += 3
a

5

In [None]:
a -= 3
a

2

In [None]:
a *= 3
a

6

In [None]:
a /= 4
a

1.5

In [None]:
a **= 3
a

3.375

In [None]:
a %= 2
a

1.375

In [None]:
a //= 1
a

1.0

# $$km_{average} = \frac {km_{total}}{(Year_{current} - Year_{manufacturing})}$$

### Operations with variables

In [None]:
year_current = 2019
year_manufacturing = 2003
km_total = 44410.0

In [None]:
km_average = km_total / (year_current - year_manufacturing)

In [None]:
km_average

2775.625

### Conclusion:
```
"value = value + 1" it's equivalent to "value += 1"
```

### Multiple declaration

In [None]:
year_current, year_manufacturing, km_total, car = 2022, 2015, 13710.76, "Cross Fox"

In [None]:
year_current

2022

In [None]:
year_manufacturing

2015

In [None]:
km_total

13710.76

In [None]:
car

'Cross Fox'

# 2.3 Data types

Data types specify how numbers and characters will be stored and manipulated within a program. Python's basic data types are:

1. **Number**
    1. ***int*** - Integers
    - ***float*** - Floating point
- **Boolean** - Assumes True or False values. Essential when we start working with conditional statements
- ***String*** - A sequence of one or more characters that can include letters, numbers, and other types of characters. Represents a text.
- **None** - Represents the absence of a value

### Number

In [None]:
current_year = 2019

In [None]:
type(current_year)

int

In [None]:
km_total = 44.09

In [None]:
type(km_total)

float

### Boolean

In [None]:
brand_new_car = True

In [None]:
type(brand_new_car)

bool

In [None]:
brand_new_car = False

In [None]:
type(brand_new_car)

bool

### String

In [None]:
name = 'New Jetta'
name

'New Jetta'

In [None]:
name = "New Jetta"
name

'New Jetta'

In [None]:
name = "New 'Jetta'"
name

"New 'Jetta'"

In [None]:
name = 'New "Jetta"'
name

'New "Jetta"'

In [None]:
name = '''name
  km
'''
car

'name\n  km\n'

In [None]:
type(name)

str

### None

In [None]:
car = None

In [None]:
type(car)

NoneType

# 2.4 Type conversion

In [None]:
a = 10
b = 20
c = "Python is "
d = 'NICE'

In [None]:
type(a)

int

In [None]:
type(b)

int

In [None]:
type(c)

str

In [None]:
type(d)

str

In [None]:
a + b

30

In [None]:
c + d

'Python is NICE'

Functions int(), float(), str()

In [None]:
c + str(a)

'Python is 10'

In [None]:
var_type = str(a)
type(var_type)

str

In [None]:
float(a)

10.0

In [None]:
pi = 3.14159265359

In [None]:
type(pi)

float

In [None]:
pi = int(pi)

In [None]:
type(pi)

int

In [None]:
pi

3

### Concatenating strings
#### Exercice

In [None]:
text = 'The average vehicle mileage is '
Km = 100000
current_year = 2019
Year_manufacturing = 1999

In [None]:
text + str( int( Km / (current_year - Year_manufacturing) ) ) + 'km'

'The average vehicle mileage is 5000km'

# 2.5 indentation, comments, and *string* formatting

### Indentation

In Python, programs are structured using indentation. In any programming language, the practice of indentation is very useful, making the code easier to read and also to maintain. In Python, indentation is not just a matter of organization and style, but a language requirement.

In [None]:
current_year = 2019
brand_new_car = 2019

if (current_year == brand_new_car):
    print(True)
else:
    print(False)

True


### Comments

Comments are extremely important in a program. It consists of text that describes what the program or a specific part of the program is doing. Comments are ignored by the Python interpreter.

We can have single-line or multi-line comments.

In [None]:
# this is a comment
current_year = 2019
current_year

2019

In [None]:
# this
# is a  
# comment
current_year = 2019
current_year

2019

In [None]:
'''this is a
comment'''
current_year = 2019
current_year

2019

In [None]:
# Defining variables
current_year = 2019
brand_new_car = 2019

'''
Conditional structure that we will
learn in the next class
'''
if (current_year == brand_new_car):   # Testing if condition is true
    print(True)
else:                               # Testing if condition is false
    print(False)

True


### *String* formatting

## *str % value*
https://docs.python.org/3.6/library/stdtypes.html#old-string-formatting

## *str.format()*

https://docs.python.org/3.6/library/stdtypes.html#str.format

In [None]:
my_name = 'Rodrigo'
random = 42

In [None]:
print('Heloo {}!'.format(my_name))

Heloo Rodrigo!


In [None]:
'Hello {}. Your favorite number is {}'.format(my_name, random)

'Hello Rodrigo. Your favorite number is 42'

In [None]:
'Hello {name}. Your favorite number is {number}'.format(number = random, name = my_name)

'Hello Rodrigo. Your favorite number is 42'

## *f-Strings*

https://docs.python.org/3.6/reference/lexical_analysis.html#f-strings

In [None]:
f'Hello {my_name}. Your favorite number is {random}'

'Hello Rodrigo. Your favorite number is 42'

### String formatting
#### Exercice

In [None]:
print('Hello, {name}! This is your access number {hits}'.format(hits = 32, name = 'Rodrigo'))

Hello, Rodrigo! This is your access number 32


In [None]:
name = 'Rodrigo'
hits = 32
print(f'Hello, {name}! This is your access number {hits}')

Hello, Rodrigo! This is your access number 32


In [None]:
print('Hello, {}! This is your access number {}'.format('Rodrigo', 32))

Hello, Rodrigo! This is your access number 32


# <font color=green> 3. WORKING WITH LISTS
---

# 3.1 Creating lists

Lists are **mutable** sequences that are used to store collections of items, usually homogeneous. They can be built in several ways:
```
- Using a pair of square brackets: [ ], [ 1 ]
- Using a pair of square brackets with comma-separated items: [ 1, 2, 3 ]
```

In [None]:
accessories = ['Alloy wheels', 'Power locks', 'Autopilot', 'Leather seats', 'Air conditioning', 'Parking sensor', 'Twilight sensor', 'Rain sensor']

In [None]:
accessories

['Alloy wheels',
 'Power locks',
 'Autopilot',
 'Leather seats',
 'Air conditioning',
 'Parking sensor',
 'Twilight sensor',
 'Rain sensor']

In [None]:
type(accessories)

list

### List with different data types

In [None]:
car_1 = ['Jetta Variant', '4.0 Turbo Engine', 2003, 44410.0, False, ['Alloy Wheels', 'Power Locks', 'Autopilot'], 88078.64]
car_2 = ['Passat', 'Diesel Engine', 1991, 5712.0, False, ['Multimedia Center', 'Panoramic Roof', 'ABS Brakes'], 106161.94]

In [None]:
type(car_1)

list

In [None]:
car_1

['Jetta Variant',
 '4.0 Turbo Engine',
 2003,
 44410.0,
 False,
 ['Alloy Wheels', 'Power Locks', 'Autopilot'],
 88078.64]

In [None]:
car_2

['Passat',
 'Diesel Engine',
 1991,
 5712.0,
 False,
 ['Multimedia Center', 'Panoramic Roof', 'ABS Brakes'],
 106161.94]

In [None]:
cars = [car_1, car_2]

In [None]:
type(cars)

list

In [None]:
cars

[['Jetta Variant',
  '4.0 Turbo Engine',
  2003,
  44410.0,
  False,
  ['Alloy Wheels', 'Power Locks', 'Autopilot'],
  88078.64],
 ['Passat',
  'Diesel Engine',
  1991,
  5712.0,
  False,
  ['Multimedia Center', 'Panoramic Roof', 'ABS Brakes'],
  106161.94]]

# 3.2 List operations

https://docs.python.org/3.6/library/stdtypes.html#common-sequence-operations

## *x in A*

Returns **True** if an element in the list *A* is equal to *x*.

In [None]:
accessories

['Alloy wheels',
 'Power locks',
 'Autopilot',
 'Leather seats',
 'Air conditioning',
 'Parking sensor',
 'Twilight sensor',
 'Rain sensor']

In [None]:
'Power locks' in accessories

True

In [None]:
'4 x 4' in accessories

False

In [None]:
'Power locks' not in accessories

False

In [None]:
'4 x 4' not in accessories

True

## *A + B*

Concatenates lists A and B.

In [None]:
A = ['Alloy wheels', 'Power locks', 'Autopilot', 'Leather seats']
B = ['Air conditioning', 'Parking sensor', 'Twilight sensor', 'Rain sensor']

In [None]:
A

['Alloy wheels', 'Power locks', 'Autopilot', 'Leather seats']

In [None]:
B

['Air conditioning', 'Parking sensor', 'Twilight sensor', 'Rain sensor']

In [None]:
C = A + B

In [None]:
C

['Alloy wheels',
 'Power locks',
 'Autopilot',
 'Leather seats',
 'Air conditioning',
 'Parking sensor',
 'Twilight sensor',
 'Rain sensor']

## *len(A)*

List size A.

In [None]:
len(A)

4

In [None]:
len(B)

4

In [None]:
len(C)

8

### Working with lists
#### Exercice

In [None]:
car = [
     'Jetta Variant',
     '4.0 Turbo Engine',
     2003,
     44410.0,
     False,
     ['Alloy wheels', 'Power locks', 'Autopilot'],
     88078.64
]

In [None]:
'2003' in car

False

In [None]:
'Alloy wheels' in car

False

In [None]:
'False' not in car

True

# 3.3 Selections in lists

## *A[ i ]*

Returns the i-th item in the list *A*.

<font color=red>**Note:**</font> Lists have zero-source indexing.

In [None]:
accessories = ['Alloy wheels', 'Power locks', 'Autopilot', 'Leather seats', 'Air conditioning', 'Parking sensor', 'Twilight sensor', 'Rain sensor']

In [None]:
accessories[0]

'Alloy wheels'

In [None]:
accessories[1]

'Power locks'

In [None]:
accessories[-1]

'Rain sensor'

In [None]:
accessories[-2]

'Twilight sensor'

In [None]:
C[0]

'Alloy wheels'

In [None]:
car = [
     'Jetta Variant',
     '4.0 Turbo Engine',
     2003,
     44410.0,
     False,
     ['Alloy wheels', 'Power locks', 'Autopilot'],
     88078.64
]

In [None]:
car[5]

['Alloy wheels', 'Power locks', 'Autopilot']

In [None]:
car[5][0]

'Alloy wheels'

## *A[ i : j ]*

Cut list *A* from index i to j. In this slicing the element with index i is **included** and the element with index j is **not included** in the result.

In [None]:
accessories[2:5]

['Autopilot', 'Leather seats', 'Air conditioning']

In [None]:
accessories[2:]

['Autopilot',
 'Leather seats',
 'Air conditioning',
 'Parking sensor',
 'Twilight sensor',
 'Rain sensor']

In [None]:
accessories[:5]

['Alloy wheels',
 'Power locks',
 'Autopilot',
 'Leather seats',
 'Air conditioning']

### Selections and slicing
#### Exercice

In [None]:
letters = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']

In [None]:
letters[:2]

['A', 'B']

In [None]:
letters[2:5]

['C', 'D', 'E']

In [None]:
letters[-3:]

['F', 'G', 'H']

### Lists within lists
#### Exercice

In [None]:
exercice_cars = [
     [
         'Jetta Variant',
         '4.0 Turbo Engine',
         2003,
         False,
         ['Alloy wheels', 'Power locks', 'Autopilot']
     ],
     [
         'Passat',
         'Diesel engine',
         1991,
         True,
         ['Multimedia centre', 'Panoramic roof', 'ABS brakes']
     ]
]

In [None]:
exercice_cars[1][3]

True

In [None]:
exercice_cars[1][-1][-2]

'Panoramic roof'

In [None]:
exercice_cars[0][-1]

['Alloy wheels', 'Power locks', 'Autopilot']

# 3.4 List methods

https://docs.python.org/3.6/library/stdtypes.html#mutable-sequence-types

In [None]:
accessories = ['Alloy wheels', 'Power locks', 'Autopilot', 'Leather seats', 'Air conditioning', 'Parking sensor', 'Twilight sensor', 'Rain sensor']

## *A.sort()*

Sort the *A* list.

In [None]:
accessories.sort()

In [None]:
accessories

['Air conditioning',
 'Alloy wheels',
 'Autopilot',
 'Leather seats',
 'Parking sensor',
 'Power locks',
 'Rain sensor',
 'Twilight sensor']

## *A.append(x)*

Add the *x* element to the end of the *A* list.

In [None]:
accessories.append('4 x 4')

In [None]:
accessories

['Air conditioning',
 'Alloy wheels',
 'Autopilot',
 'Leather seats',
 'Parking sensor',
 'Power locks',
 'Rain sensor',
 'Twilight sensor',
 '4 x 4']

## *A.pop(i)*

Removes and returns index element i from list *A*.

<font color=red>**Note:**</font> By *default* the *pop()* method removes and returns the last element of a list.

In [None]:
accessories.pop()

'4 x 4'

In [None]:
accessories.pop(3)

'Leather seats'

In [None]:
accessories

['Air conditioning',
 'Alloy wheels',
 'Autopilot',
 'Parking sensor',
 'Power locks',
 'Rain sensor',
 'Twilight sensor']

## *A.copy()*

Creates a copy of the *A* list.

<font color=red>**Note:**</font> The same result can be obtained with the following code:
```
A[:]
```

In [None]:
accessories_copy = accessories.copy()

In [None]:
accessories_copy.pop()

'Twilight sensor'

In [None]:
accessories_copy.append('4 x 4')

In [None]:
accessories

['Air conditioning',
 'Alloy wheels',
 'Autopilot',
 'Parking sensor',
 'Power locks',
 'Rain sensor',
 'Twilight sensor']

In [None]:
accessories_copy

['Air conditioning',
 'Alloy wheels',
 'Autopilot',
 'Parking sensor',
 'Power locks',
 'Rain sensor',
 '4 x 4']

In [None]:
new_list = accessories + accessories_copy
new_list.sort()

In [None]:
new_list

['4 x 4',
 'Air conditioning',
 'Air conditioning',
 'Alloy wheels',
 'Alloy wheels',
 'Autopilot',
 'Autopilot',
 'Parking sensor',
 'Parking sensor',
 'Power locks',
 'Power locks',
 'Rain sensor',
 'Rain sensor',
 'Twilight sensor']

### Elaborations with lists
#### Exercice

In [None]:
accessories = [
     'Alloy wheels',
     'Electric Locks',
     'Automatic pilot',
     'Leather Seats',
     'Air conditioning'
]

In [None]:
accessories.append('Airbag')
accessories.sort()
accessories.pop()
accessories.append('Electric windows')
accessories

['Air conditioning',
 'Airbag',
 'Alloy wheels',
 'Automatic pilot',
 'Electric Locks',
 'Electric windows']

# <font color=green> 4. REPETITION AND CONDITIONAL STRUCTURES
---

# 4.1 *For* instruction

#### Standard format

```
for <variable> in <collection>:
    <instruction>
```

### Loops with lists

In [1]:
accessories = ['Alloy wheels', 'Power locks', 'Autopilot', 'Leather seats', 'Air conditioning', 'Parking sensor', 'Twilight sensor', 'Rain sensor']

In [2]:
for item in accessories:
  print(item)

Alloy wheels
Power locks
Autopilot
Leather seats
Air conditioning
Parking sensor
Twilight sensor
Rain sensor


###  List comprehensions

https://docs.python.org/3.6/tutorial/datastructures.html#list-comprehensions

*range()* -> https://docs.python.org/3.6/library/functions.html#func-range

In [3]:
range(10)

range(0, 10)

In [4]:
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [5]:
for i in range(10):
  print(i ** 2)

0
1
4
9
16
25
36
49
64
81


In [6]:
square = []

for i in range(10):
  square.append(i ** 2)

square

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [8]:
square = [i ** 2 for i in range(10)]
square

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

# 4.2 Nested loops

In [9]:
data = [
     ['Alloy wheels', 'Power locks', 'Autopilot', 'Leather seats', 'Air conditioning', 'Parking sensor', 'Twilight sensor', 'Rain sensor'],
     ['Multimedia center', 'Panoramic roof', 'ABS brakes', '4 X 4', 'Digital panel', 'Autopilot', 'Leather seats', 'Parking camera'],
     ['Autopilot', 'Stability control', 'Twilight sensor', 'ABS brakes', 'Automatic transmission', 'Leather seats', 'Multimedia center', 'Power windows']
]
data

[['Alloy wheels',
  'Power locks',
  'Autopilot',
  'Leather seats',
  'Air conditioning',
  'Parking sensor',
  'Twilight sensor',
  'Rain sensor'],
 ['Multimedia center',
  'Panoramic roof',
  'ABS brakes',
  '4 X 4',
  'Digital panel',
  'Autopilot',
  'Leather seats',
  'Parking camera'],
 ['Autopilot',
  'Stability control',
  'Twilight sensor',
  'ABS brakes',
  'Automatic transmission',
  'Leather seats',
  'Multimedia center',
  'Power windows']]

In [10]:
for lists in data:
  print(lists)

['Alloy wheels', 'Power locks', 'Autopilot', 'Leather seats', 'Air conditioning', 'Parking sensor', 'Twilight sensor', 'Rain sensor']
['Multimedia center', 'Panoramic roof', 'ABS brakes', '4 X 4', 'Digital panel', 'Autopilot', 'Leather seats', 'Parking camera']
['Autopilot', 'Stability control', 'Twilight sensor', 'ABS brakes', 'Automatic transmission', 'Leather seats', 'Multimedia center', 'Power windows']


In [11]:
for lists in data:
  for item in lists:
    print(item)

Alloy wheels
Power locks
Autopilot
Leather seats
Air conditioning
Parking sensor
Twilight sensor
Rain sensor
Multimedia center
Panoramic roof
ABS brakes
4 X 4
Digital panel
Autopilot
Leather seats
Parking camera
Autopilot
Stability control
Twilight sensor
ABS brakes
Automatic transmission
Leather seats
Multimedia center
Power windows


In [14]:
accessories = []

for lists in data:
  for item in lists:
    accessories.append(item)

accessories

['Alloy wheels',
 'Power locks',
 'Autopilot',
 'Leather seats',
 'Air conditioning',
 'Parking sensor',
 'Twilight sensor',
 'Rain sensor',
 'Multimedia center',
 'Panoramic roof',
 'ABS brakes',
 '4 X 4',
 'Digital panel',
 'Autopilot',
 'Leather seats',
 'Parking camera',
 'Autopilot',
 'Stability control',
 'Twilight sensor',
 'ABS brakes',
 'Automatic transmission',
 'Leather seats',
 'Multimedia center',
 'Power windows']

## *set()*

https://docs.python.org/3.6/library/stdtypes.html#types-set

https://docs.python.org/3.6/library/functions.html#func-set

In [18]:
accessories_distinct = set(accessories)
accessories_distinct

{'4 X 4',
 'ABS brakes',
 'Air conditioning',
 'Alloy wheels',
 'Automatic transmission',
 'Autopilot',
 'Digital panel',
 'Leather seats',
 'Multimedia center',
 'Panoramic roof',
 'Parking camera',
 'Parking sensor',
 'Power locks',
 'Power windows',
 'Rain sensor',
 'Stability control',
 'Twilight sensor'}

In [19]:
list(accessories_distinct)

['Panoramic roof',
 '4 X 4',
 'Parking sensor',
 'Digital panel',
 'ABS brakes',
 'Stability control',
 'Automatic transmission',
 'Rain sensor',
 'Alloy wheels',
 'Parking camera',
 'Leather seats',
 'Power locks',
 'Multimedia center',
 'Power windows',
 'Autopilot',
 'Air conditioning',
 'Twilight sensor']

### List comprehensions

In [20]:
[item for lists in data for item in lists]

['Alloy wheels',
 'Power locks',
 'Autopilot',
 'Leather seats',
 'Air conditioning',
 'Parking sensor',
 'Twilight sensor',
 'Rain sensor',
 'Multimedia center',
 'Panoramic roof',
 'ABS brakes',
 '4 X 4',
 'Digital panel',
 'Autopilot',
 'Leather seats',
 'Parking camera',
 'Autopilot',
 'Stability control',
 'Twilight sensor',
 'ABS brakes',
 'Automatic transmission',
 'Leather seats',
 'Multimedia center',
 'Power windows']

In [23]:
list(set([item for lists in data for item in lists]))

# OR
all_items = [item for lists in data for item in lists]
distinct_items = set(all_items)
items = list(distinct_items)

### Iterating through lists of lists
#### Exercise

In [38]:
data_exercise = [ 
    ['A', 'B', 'C'],
    ['D', 'E', 'F'],
    ['G', 'H', 'I']
]

In [37]:
result_2 = []
for lists in data_exercise:
    result_2 += lists
result_2

['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I']

In [39]:
result_2 = []
for lists in data_exercise:
    result_2 += lists
result_2

['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I']

In [40]:
[item for lists in data_exercise for item in lists]

['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I']

# 4.3 *if* statement

#### Standard format

```
if <condition>:
     <statements if the condition is true>
```

### Comparison operators: $==$, $!=$, $>$, $<$, $>=$, $<=$
### AND
### Logical operators: $and$, $or$, $not$

In [104]:
# 1st item on the list - Vehicle name
# 2nd item on the list - Year of manufacture
# 3rd item on the list - Vehicle is zero km?

data = [
    ['Jetta Variant', 2003, False],
    ['Passat', 1991, False],
    ['Crossfox', 1990, False],
    ['DS5', 2019, True],
    ['Aston Martin DB4', 2006, False],
    ['Palio Weekend', 2012, False],
    ['A5', 2019, True],
    ['Série 3 Cabrio', 2009, False],
    ['Dodge Jorney', 2019, False],
    ['Carens', 2011, False],
    ['Corolla', 2021, True]
]
data

[['Jetta Variant', 2003, False],
 ['Passat', 1991, False],
 ['Crossfox', 1990, False],
 ['DS5', 2019, True],
 ['Aston Martin DB4', 2006, False],
 ['Palio Weekend', 2012, False],
 ['A5', 2019, True],
 ['Série 3 Cabrio', 2009, False],
 ['Dodge Jorney', 2019, False],
 ['Carens', 2011, False],
 ['Corolla', 2021, True]]

In [105]:
for vehicle in data:
  if vehicle[-1]:
    print(vehicle)

['DS5', 2019, True]
['A5', 2019, True]
['Corolla', 2021, True]


In [106]:
brand_new_car = []

for vehicle in data:
  if vehicle[2]:
    brand_new_car.append(vehicle)

brand_new_car

[['DS5', 2019, True], ['A5', 2019, True], ['Corolla', 2021, True]]

In [107]:
used_car = []

for vehicle in data:
  if not vehicle[2]:
    used_car.append(vehicle)

used_car

[['Jetta Variant', 2003, False],
 ['Passat', 1991, False],
 ['Crossfox', 1990, False],
 ['Aston Martin DB4', 2006, False],
 ['Palio Weekend', 2012, False],
 ['Série 3 Cabrio', 2009, False],
 ['Dodge Jorney', 2019, False],
 ['Carens', 2011, False]]

### List comprehensions

In [108]:
[vehicle for vehicle in data if vehicle[2]]

[['DS5', 2019, True], ['A5', 2019, True], ['Corolla', 2021, True]]

In [109]:
[vehicle for vehicle in data if not vehicle[2]]

[['Jetta Variant', 2003, False],
 ['Passat', 1991, False],
 ['Crossfox', 1990, False],
 ['Aston Martin DB4', 2006, False],
 ['Palio Weekend', 2012, False],
 ['Série 3 Cabrio', 2009, False],
 ['Dodge Jorney', 2019, False],
 ['Carens', 2011, False]]

# 4.4 Instructions *if-else* and *if-elif-else*

#### Standard format

```
if <condition>:
    <statements if the condition is true>
else:
    <instructions if the condition is not true>
```

In [110]:
brand_new_car =[]
used_car = []

for vehicle in data:
  if vehicle[2]:
    brand_new_car.append(vehicle)
  else:
    used_car.append(vehicle)

In [111]:
brand_new_car

[['DS5', 2019, True], ['A5', 2019, True], ['Corolla', 2021, True]]

In [112]:
used_car

[['Jetta Variant', 2003, False],
 ['Passat', 1991, False],
 ['Crossfox', 1990, False],
 ['Aston Martin DB4', 2006, False],
 ['Palio Weekend', 2012, False],
 ['Série 3 Cabrio', 2009, False],
 ['Dodge Jorney', 2019, False],
 ['Carens', 2011, False]]

#### Standard format

```
if <condition 1>:
     <instructions if condition 1 is true>
elif <condition 2>:
     <instructions if condition 2 is true>
elif <condition 3>:
     <instructions if condition 3 is true>
                        .
                        .
                        .
else:
    <instructions if the above conditions are not true>
```

In [83]:
print('AND')
print(f'(True and True) the result is: {True and True}')
print(f'(True and False) the result is: {True and False}')
print(f'(False and True) the result is: {False and True}')
print(f'(False and False) the result is: {False and False}')

AND
(True and True) the result is: True
(True and False) the result is: False
(False and True) the result is: False
(False and False) the result is: False


In [84]:
print('OR')
print(f'(True or True) the result is: {True or True}')
print(f'(True or False) the result is: {True or False}')
print(f'(False or True) the result is: {False or True}')
print(f'(False or False) the result is: {False or False}')

OR
(True or True) the result is: True
(True or False) the result is: True
(False or True) the result is: True
(False or False) the result is: False


In [113]:
data

[['Jetta Variant', 2003, False],
 ['Passat', 1991, False],
 ['Crossfox', 1990, False],
 ['DS5', 2019, True],
 ['Aston Martin DB4', 2006, False],
 ['Palio Weekend', 2012, False],
 ['A5', 2019, True],
 ['Série 3 Cabrio', 2009, False],
 ['Dodge Jorney', 2019, False],
 ['Carens', 2011, False],
 ['Corolla', 2021, True]]

In [116]:
a, b, c = [], [], []

for item in data:
  if item[1] <= 2000:
    a.append(item)
  elif item[1] > 2000 and item[1] <= 2019:
    b.append(item)
  else:
    c.append(item)

In [117]:
a

[['Passat', 1991, False], ['Crossfox', 1990, False]]

In [118]:
b

[['Jetta Variant', 2003, False],
 ['DS5', 2019, True],
 ['Aston Martin DB4', 2006, False],
 ['Palio Weekend', 2012, False],
 ['A5', 2019, True],
 ['Série 3 Cabrio', 2009, False],
 ['Dodge Jorney', 2019, False],
 ['Carens', 2011, False]]

In [119]:
c

[['Corolla', 2021, True]]

In [120]:
a, b, c = [], [], []

for item in data:
  if item[1] <= 2000:
    a.append(item)
  elif 2000 < item[1] <= 2019: # Other syntax
    b.append(item)
  else:
    c.append(item)

In [123]:
a

[['Passat', 1991, False], ['Crossfox', 1990, False]]

In [122]:
b

[['Jetta Variant', 2003, False],
 ['DS5', 2019, True],
 ['Aston Martin DB4', 2006, False],
 ['Palio Weekend', 2012, False],
 ['A5', 2019, True],
 ['Série 3 Cabrio', 2009, False],
 ['Dodge Jorney', 2019, False],
 ['Carens', 2011, False]]

In [124]:
c

[['Corolla', 2021, True]]

# <font color=green> 5. BASIC NUMPY
---

Numpy is short for Numerical Python and is one of the most important packages for numerical processing in Python. Numpy provides the basis for most scientific application packages that use Python numeric data (data structures and algorithms). We can highlight the following features that the Numpy package contains:

- A powerful multidimensional array object;
- Sophisticated mathematical functions for operations with arrays without the need to use *for* loops;
- Linear algebra and random number generation features

In addition to its obvious scientific uses, the NumPy package is also widely used in data analysis as an efficient multidimensional container of generic data for transport between various Python algorithms and libraries.

**Version:** 1.16.5

**Installation:** https://scipy.org/install.html

**Documentation:** https://numpy.org/doc/1.16/

### packages

There are several Python packages available for download on the internet. Each package aims to solve a certain type of problem and for that, new types, functions and methods are developed.

Some packages are widely used in a data science context such as:

- Numpy
- pandas
- Scikit-learn
- Matplotlib

Some packages are not shipped with the default Python installation. In this case we must install the packages we need on our system in order to use their features.

### Importing full package

https://numpy.org/doc/1.16/reference/generated/numpy.arange.html

### Importing full package and assigning a new name

### Importing part of the package

# 5.1 Creating Numpy arrays

### From lists

https://numpy.org/doc/1.16/user/basics.creation.html

https://numpy.org/doc/1.16/user/basics.types.html

### From external data

https://numpy.org/doc/1.16/reference/generated/numpy.loadtxt.html

### Two-dimensional arrays

In [None]:
data = [
     ['Alloy wheels', 'Power locks', 'Autopilot', 'Leather seats', 'Air conditioning', 'Parking sensor', 'Twilight sensor', 'Rain sensor'],
     ['Multimedia center', 'Panoramic roof', 'ABS brakes', '4 X 4', 'Digital panel', 'Autopilot', 'Leather seats', 'Parking camera'],
     ['Autopilot', 'Stability control', 'Twilight sensor', 'ABS brakes', 'Automatic transmission', 'Leather seats', 'Multimedia center', 'Power windows']
]
data

[['Alloy wheels',
  'Power locks',
  'Autopilot',
  'Leather seats',
  'Air conditioning',
  'Parking sensor',
  'Twilight sensor',
  'Rain sensor'],
 ['Multimedia center',
  'Panoramic roof',
  'ABS brakes',
  '4 X 4',
  'Digital panel',
  'Autopilot',
  'Leather seats',
  'Parking camera'],
 ['Autopilot',
  'Stability control',
  'Twilight sensor',
  'ABS brakes',
  'Automatic transmission',
  'Leather seats',
  'Multimedia center',
  'Power windows']]

### Comparing performance with lists

# 5.2 Arithmetic operations with Numpy arrays

### Operations between arrays and constants

In [None]:
km = [44410., 5712., 37123., 0., 25757.]
years = [2003, 1991, 1990, 2019, 2006]

In [None]:
km = np.array([44410., 5712., 37123., 0., 25757.])
years = np.array([2003, 1991, 1990, 2019, 2006])

NameError: ignored

# Nova seção

### Operations between arrays

### Operations with two-dimensional arrays

![1410-img01.png](https://caelum-online-public.s3.amazonaws.com/1410-pythondatascience/01/1410-img01.png)

# 5.3 Selections with Numpy arrays

![1410-img01.png](https://caelum-online-public.s3.amazonaws.com/1410-pythondatascience/01/1410-img01.png)

![1410-img02.png](https://caelum-online-public.s3.amazonaws.com/1410-pythondatascience/01/1410-img02.png)

### Indexing

<font color=red>**Note:**</font> Indexing starts at zero.

## <font color=green>**Hint:**</font>
### *ndarray[ row ][ column ]* or *ndarray[ row, column ]*

### Slicing
 
The syntax for slicing a Numpy array is $i : j : k$ where $i$ is the start index, $j$ is the stop index, and $k$ is the step indicator ($k\neq0$ )
 
<font color=red>**Note:**</font> In slices (*slices*) the item with index i is **included** and the item with index j is **not included** in the result.

![1410-img01.png](https://caelum-online-public.s3.amazonaws.com/1410-pythondatascience/01/1410-img01.png)

### Indexing with boolean array

<font color=red>**Note:**</font> Selects a group of rows and columns according to labels or a boolean array.

# 5.4 Numpy arrays attributes and methods

### Attributes

https://numpy.org/doc/1.16/reference/arrays.ndarray.html#array-attributes

## *ndarray.shape*

Returns a tuple with the dimensions of the array.

## *ndarray.ndim*

Returns the number of dimensions in the array.

## *ndarray.size*

Returns the number of elements in the array.

## *ndarray.dtype*

Returns the data type of array elements.

## *ndarray.T*

Returns the transposed array, that is, converts rows to columns and vice versa.

### Methods

https://numpy.org/doc/1.16/reference/arrays.ndarray.html#array-methods

## *ndarray.tolist()*

Returns the array as a Python list.

## *ndarray.reshape(shape[, order])*

Retorna um array que contém os mesmos dados com uma nova forma.

In [None]:
km = [44410, 5712, 37123, 0, 25757]
years = [2003, 1991, 1990, 2019, 2006]

## *ndarray.resize(new_shape[, refcheck])*

Change the shape and size of the array.

# 5.5 Statistics with Numpy arrays

https://numpy.org/doc/1.16/reference/arrays.ndarray.html#calculation

AND

https://numpy.org/doc/1.16/reference/routines.statistics.html

AND

https://numpy.org/doc/1.16/reference/routines.math.html

In [None]:
years = np.loadtxt(fname = "cars-years.txt", dtype = int)
km = np.loadtxt(fname = "cars-km.txt")
value = np.loadtxt(fname = "cars-value.txt")

https://numpy.org/doc/1.16/reference/generated/numpy.column_stack.html

## *np.mean()*

Returns the average of array elements along the specified axis.

## *np.std()*

Returns the standard deviation of array elements along the specified axis.

## *ndarray.sum()*

Returns the sum of array elements along the specified axis.

## *np.sum()*

Returns the sum of array elements along the specified axis.