# Introduction to Python Programming

- This notebook is for you if you have never written a single line of Python code, and you are interested in learning machine learning.

- In this notebook, you will learn how to use Python code to get a computer to perform certain tasks for you.

- Python is one of the most popular programming languages for machine learning and data science, and it's the language you'll learn in this notebook.

## üñ®Ô∏è Printing

- One of the simplest (and most important!) tasks you can ask a computer to do is to print a message.

- In Python, we ask a computer to print a message for us by writing `print()` and putting the message inside the parentheses and enclosed in quotation marks. Below, we ask the computer to print the message `Hello, world!`.

In [None]:
print("Hello, world!")

Hello, world!


The code is inside the box (known as a **code cell**), and the computer's response (called the **output** of the code) is shown below the box. As you can see, the computer printed the message that we wanted.

## ‚ûó Arithmetic

We can also print the value of some arithmetic operation (such as addition, subtraction, multiplication, or division).

For instance, in the next code cell, the computer adds 2 to 1 and then prints the result, which is 3. Note that unlike when we were simply printing text, we don't use any quotation marks.

In [None]:
print(1 + 2)

3


We can also do subtraction in python. The next code cell subtracts 5 from 9 and prints the result, which is 4.

In [None]:
print(9 - 5)

4


You can actually do a lot of calculations with python! See the table below for some examples.

| Operation	| Symbol | Example |
| --------- | ------ | ------- |
| Addition	| + |	1 + 2 = 3 |
| Subtraction |	-	| 5 - 4 = 1 |
| Multiplication |	* |	2 * 4 = 8 |
| Division	| / |	6 / 3 = 2 |
| Exponent |	** |	3 ** 2 = 9 |

You can control the order of operations in long calculations with parentheses.

In [None]:
print(((1 + 1) * (9 - 2) / 2) ** 2)

49.0


In general, Python follows the [PEMDAS rule](https://www.mathsisfun.com/operation-order-pemdas.html) when deciding the order of operations.

## üí≠ Comments

We use **comments** to annotate what code is doing. They help other people to understand your code, and they can also be helpful if you haven't looked at your own code in a while. So far, the code that we have written is very short, but annotations become more important when you have written a lot of code.

For instance, in the next code cell, we multiply 3 by 2. We also add a comment (`# Multiply 3 by 2`) above the code to describe what the code is doing.

In [None]:
# Multiply 3 by 2
print(3 * 2)

6


To indicate to Python that a line is comment (and not Python code), you need to write a pound sign (`#`) as the very first character.

Once Python sees the pound sign and recognizes that the line is a comment, it is completely ignored by the computer. This is important, because just like English or Hindi (or any other language!), Python is a language with very strict rules that need to be followed. Python is stricter than a human listener, though, and will just error if it can't understand the code.

We can see an example of this, in the code cell below. Python errors if we remove the pound sign, because the text in the comment is not valid Python code, so it can't be interpreted properly.

In [None]:
Multiply 3 by 2

SyntaxError: invalid syntax (<ipython-input-7-a91ee14dd7d4>, line 1)

## üî§ Variables

So far, you have used code to make a calculation and print the result, and the result isn't saved anywhere. However, you can imagine that you might want to save the result to work with it later. For this, you'll need to use variables.

### Creating variables

The next code cell creates a variable named `test_var` and assigns it the value that we get when we add 5 to 4.

We then print the value that is assigned to the variable, which is 9.

In [None]:
# Create a variable called test_var and give it a value of 4+5
test_var = 4 + 5

# Print the value of test_var
print(test_var)

9


In general, to work with a variable, you need to begin by selecting the name you want to use. Variable names are ideally short and descriptive. They also need to satisfy several requirements:

- They can't have spaces (e.g., `test var` is not allowed)
- They can only include letters, numbers, and underscores (e.g., `test_var!` is not allowed)
- They have to start with a letter or underscore (e.g., `1_var` is not allowed)

Then, to create the variable, you need to use `=` to assign the value that you want it to have.

You can always take a look at the value assigned to the variable by using `print()` and putting the name of the variable in parentheses.

Over time, you'll learn how to select good names for Python variables. It's completely fine for it to feel uncomfortable now, and the best way to learn is just by viewing a lot of Python code!

### Manipulating variables

You can always change the value assigned to a variable by overriding the previous value.

In the code cell below, we change the value of `my_var` from 3 to 100.

In [None]:
# Set the value of a new variable to 3
my_var = 3

# Print the value assigned to my_var
print("Value assigned:")
print(my_var)

# Change the value of the variable to 100
my_var = 100

# Print the new value assigned to my_var
print("New Value:")
print(my_var)

Value assigned:
3
New Value:
100


Note that in general, whenever you define a variable in a code cell, all of the code cells that follow also have access to the variable. For instance, we use the next code cell to access the values of `my_var` (from the code cell above) and `test_var` (from earlier in this notebook).

In [None]:
print(my_var)
print(test_var)

100
9


The next code cell tells Python to increase the current value of `my_var` by 3.

To do this, we still need to use `my_var =` like before. And also just like before, the new value we want to assign to the variable is to the right of the `=` sign.

In [None]:
# Increase the value by 3
my_var = my_var + 3

# Print the value assigned to my_var
print(my_var)

103


### Using multiple variables

It's common for code to use multiple variables. This is especially useful when we have to do a long calculation with multiple inputs.

In the next code cell, we calculate the number of seconds in four years. This calculation uses five inputs.

In [None]:
# Create variables
num_years = 4
days_per_year = 365
hours_per_day = 24
mins_per_hour = 60
secs_per_min = 60

# Calculate number of seconds in four years
total_secs = secs_per_min * mins_per_hour * hours_per_day * days_per_year * num_years
print(total_secs)

126144000


As calculated above, there are 126144000 seconds in four years.

Note it is *possible* to do this calculation without variables as just `60 * 60 * 24 * 365 * 4`, but it is much harder to check that the calculation without variables does not have some error, because it is not as readable. When we use variables (such as `num_years`, `days_per_year`, etc), we can better keep track of each part of the calculation and more easily check for and correct any mistakes.

Note that it is particularly useful to use variables when the values of the inputs can change. For instance, say we want to slightly improve our estimate by updating the value of the number of days in a year from 365 to 365.25, to account for leap years. Then we can change the value assigned to `days_per_year` without changing any of the other variables and redo the calculation.

In [None]:
# Update to include leap years
days_per_year = 365.25

# Calculate number of seconds in four years
total_secs = secs_per_min * mins_per_hour * hours_per_day * days_per_year * num_years
print(total_secs)

126230400.0


Note: You might have noticed the .0 added at the end of the number, which might look unnecessary. This is caused by the fact that in the second calculation, we used a number with a fractional part (365.25), whereas the first calculation multipled just numbers with no fractional part.

### Debugging

One common error when working with variables is to accidentally introduce typos. For instance, if we spell `hours_per_day` as `hours_per_dy`, Python will error with message `NameError: name 'hours_per_dy' is not defined`.

In [None]:
print(hours_per_dy)

NameError: name 'hours_per_dy' is not defined

When you see `NameError` like this, it's an indication that you should check how you have spelled the variable that it references as "not defined". Then, to fix the error, you need only correct the spelling.

In [None]:
print(hours_per_day)

24


### üëü Exercise

Use the next code cell to:
- Define a variable `births_per_min` and set it to 250.  (There are on average 250 babies born each minute.)
- Define a variable `births_per_day` that contains the average number of babies born each day.  (To set the value of this variable, you should use `births_per_min` and some of the variables from the previous code cell.)


In [None]:
# TODO: Set the value of the births_per_min variable
births_per_min = 250

# TODO: Set the value of the births_per_day variable
births_per_day = births_per_min * mins_per_hour * hours_per_day

# TODO: Print the value of the births_per_day variable
print(births_per_day)

360000


## ü§ñ Functions

In this part, you will learn how to organize your code with functions. A function is a block of code designed to perform a specific task. As you'll see, functions will let you do roughly the same calculation multiple times without duplicating any code.

### Simple Example

We begin with a simple example of a function. The `add_three()` function below accepts any number, adds three to it, and then returns the result.

In [None]:
# Define the function
def add_three(input_var):
    output_var = input_var + 3
    return output_var

Every function is composed of two pieces: a header and body.

![Functions](https://github.com/rugvedmhatre/machine-learning-summer/blob/main/assets/images/ml-summer-school/day-1/functions.png?raw=true)

__Header__

The function header defines the name of the function and its argument(s).

- Every function header begins with `def`, which tells Python that we are about to define a function.
- In the example, the function name is `add_three`.
- In the example, the argument is `input_var`. The **argument** is the name of the variable that will be used as input to the function. It is always enclosed in parentheses that apppear immediately after the name of the function. (*Note that a function can also have no arguments, or it can have multiple arguments. You'll see some examples of this later in the lesson.*)
- For every function, the parentheses enclosing the function argument(s) must be followed by a colon `:`.

__Body__

The function body specifies the work that the function does.

- Every line of code in the function body must be indented exactly four spaces. You can do this by pushing the space bar four times, or by hitting the "Tab" button once on your keyboard. (*As you learn more about Python, you may need to indent your code by more than four spaces, but you'll learn more about that later in this course.*)
- The function does its work by running all of the indented lines from top to bottom.
  - It takes the argument as input, which in the example is `input_var`.
  - The function creates a new variable `output_var` with the calculation `output_var = input_var + 3`.
  - Then, the final line of code, called the **return statement**, just returns the value in `output_var` as the function's output.

The code cell above just defines the function, but does not run it. The details of the function body will make more sense after the next code cell, when we actually run the function.

### How to run (or "call") a function

When we run a function, it can also be referred to as "calling" the function.

In the code cell below, we run the function with `10` as the input value. We define a new variable `new_number` which is set to the output of the function.

In [None]:
# Run the function with 10 as input
new_number = add_three(10)

# Check that the value is 13, as expected
print(new_number)

13


![Functions - 2](https://github.com/rugvedmhatre/machine-learning-summer/blob/main/assets/images/ml-summer-school/day-1/functions_2.png?raw=true)

In more detail,

- `add_three(10)` is the value that we get as output when we supply `10` as the value for `input_var` and call the `add_three()` function. When the function runs, it runs all of the code in its body, from top to bottom:
  - It first calculates `output_var = input_var + 3`, which sets `output_var = 13`.
  - The final line of code is the return statement, which returns the value of `output_var`, which is `13`.
- By setting `new_number = add_three(10)`, we set `new_number = 13`.

**Note:** When we casually refer to the `add_three()` function in this notebook, we use empty closing parentheses after the function name. This is consistent with how people generally write explanations of Python code, and the empty parentheses just make it clear that we are referring to a function, as opposed to a variable or another Python object. These parentheses should always be empty, even if the function has arguments.

### Naming functions

In the example above, the name of the function was selected for you. When naming your own functions, you should use only lowercase letters, with words separated by underscores instead of spaces.

Naming functions will feel natural over time, and it is normal for it to feel uncomfortable at first. The best way to learn is by viewing a lot of Python code.

### A more complex example

Now that you understand the basics, we can move on to an example with a longer calculation.

Say you are helping a friend to calculate their weekly paycheck after taxes.

- They're in a 12% tax bracket (in other words, 12% of their salary is taken for taxes, and they only take home 88%), and
- They're paid hourly, at a rate of $15/hour.

The function below calculates the paycheck based on the number of hours worked. The function is more complicated than with the first example, because the function has more lines of code and comments. Similar to the example above, the function has a single argument (`num_hours`). In the function body, we:

- Use the value for `num_hours` to specify the value for a new variable `pay_pretax`.
- Use the value of `pay_pretax` to specify the value for a new variable `pay_aftertax`.
- Return the value of the `pay_aftertax` variable.

In [None]:
def get_pay(num_hours):
    # Pre-tax pay, based on receiving $15/hour
    pay_pretax = num_hours * 15

    # After-tax pay, based on being in 12% tax bracket
    pay_aftertax = pay_pretax * (1 - 0.12)

    return pay_aftertax

We call this function the same way we called the first function. The next code cell calculates the paycheck, based on working 40 hours. (After taxes, it is $528.)

In [None]:
# Calculate pay based on working 40 hours
pay_fulltime = get_pay(40)
print(pay_fulltime)

528.0


To quickly calculate pay based on a different number of hours worked, you need to supply the function with a different number. For instance, say your friend works 32 hours. (Then, they get $422.40.)

In [None]:
pay_parttime = get_pay(32)
print(pay_parttime)

422.4


Because you wrote a function, you can calculate pay for different hours without having to rewrite all of the code in the calculations all over again.

Functions can help you to avoid errors in your code, and you save a lot of time. In general, when coding, you should aim to write as little as possible, because each time you type out a calculation, it's another opportunity to accidentally introduce a typo or error.

### Variable "scope"

Variables defined inside the function body cannot be accessed outside of the function. For instance, the next code cell errors, because `pay_aftertax` only exists inside the function.

In [None]:
print(pay_aftertax)

NameError: name 'pay_aftertax' is not defined

You will get the same error if you try to print `pay_pretax` or `num_hours`. For this reason, if you need any information from a function, you need to make sure that appears in the return statement at the end of the function.

We refer to a variable's **scope** as the part of the code where it is accessible. Variables defined inside a function (like `pay_aftertax`) have a local scope of that function only. However, as you've seen, variables defined outside all functions (like `pay_parttime`) have a global scope and can be accessed anywhere.

### Functions with multiple arguments

So far, you have learned how to define a function with just one argument.  To define a function with multiple arguments, you only need to add more arguments inside the parentheses in the function head and separate them with a comma.

We do this with the `get_pay_with_more_inputs()` function below, which calculates a weekly paycheck based on three arguments:
- `num_hours` - number of hours worked in one week
- `hourly_wage` - the hourly wage (in $/hour)
- `tax_bracket` - percentage of your salary that is removed for taxes

In [None]:
def get_pay_with_more_inputs(num_hours, hourly_wage, tax_bracket):
    # Pre-tax pay
    pay_pretax = num_hours * hourly_wage
    # After-tax pay
    pay_aftertax = pay_pretax * (1 - tax_bracket)
    return pay_aftertax

Then, to call the function, you need to provide one value for each input, again separated by a comma.

In the code cell below, we calculate the pay after taxes for someone who works 40 hours, makes $24/hour, and is in a 22% tax bracket.

In [None]:
higher_pay_aftertax = get_pay_with_more_inputs(40, 24, .22)
print(higher_pay_aftertax)

748.8000000000001


The following code cell gives the same result as when we ran `get_pay(40)`, because `hourly_wage` is set to 15, and `tax_bracket` is set to 12%, which lines up with how we designed `get_pay`.

In [None]:
same_pay_fulltime = get_pay_with_more_inputs(40, 15, .12)
print(same_pay_fulltime)

528.0


Depending on how we plan to use this new function `get_pay_with_more_inputs()`, it can be more useful than the original function `get_pay()`, because it addresses more cases.  Instead of potentially incorrectly assuming the hourly wage and tax bracket, the new function allows the user to specify the correct values.  But, if you're sure the hourly wage and tax bracket won't need to change, the new function is just more complicated than necessary.  In general, when defining functions, you'll need to consider how much flexibility you need, based on your use case.

### Functions with no arguments

Note that it's possible to define function with no arguments, and that don't have a return statement.  The `print_hello()` function in the code cell below is an example.  

In [None]:
# Define the function with no arguments and with no return
def print_hello():
    print("Hello!")
    print("Good morning!")

# Call the function
print_hello()

Hello!
Good morning!


### üëü Exercise

**Question**

In the House Prices Prediction ML Application, you need to use information like the number of bedrooms and bathrooms to predict the price of a house.  Inspired by this, you'll write your own function to do this.

In the next code cell, create a function `get_expected_cost()` that has two arguments:
- `beds` - number of bedrooms
- `baths` - number of bathrooms

It should return the expected cost of a house with that number of bedrooms and bathrooms.  Assume that:
- the expected cost for a house with 0 bedrooms and 0 bathrooms is `80000`.  
- each bedroom adds `30000` to the expected cost
- each bathroom adds `10000` to the expected cost.

For instance,
- a house with 1 bedroom and 1 bathroom has an expected cost of `120000`, and
- a house with 2 bedrooms and 1 bathroom has an expected cost of `150000`.

In [None]:
# TODO: Complete the function
def get_expected_cost(beds, baths):
    value = 80000 + 30000 * beds + 10000 * baths
    return value

You are thinking about buying a home and want to get an idea of how much you will spend, based on the number of bedrooms and bathrooms.  You are trying to decide between four different options:
- Option 1: house with two bedrooms and three bathrooms
- Option 2: house with three bedrooms and two bathrooms
- Option 3: house with three bedrooms and three bathrooms
- Option 4: house with three bedrooms and four bathrooms

Use the `get_expected_cost()` function you defined in question 1 to set `option_1`, `option_2`, `option_3`, and `option_4` to the expected cost of each option.

In [None]:
# TODO: Use the get_expected_cost function to fill in each value
option_one = get_expected_cost(2, 3)
option_two = get_expected_cost(3, 2)
option_three = get_expected_cost(3, 3)
option_four = get_expected_cost(3, 4)

print(option_one)
print(option_two)
print(option_three)
print(option_four)

170000
190000
200000
210000


**Question**

You're a home decorator, and you'd like to use Python to streamline some of your work.  Specifically, you're creating a tool that you intend to use to calculate the cost of painting a room.

As a first step, define a function `get_cost()` that takes as input:
- `sqft_walls` = total square feet of walls to be painted
- `sqft_ceiling` = square feet of ceiling to be painted
- `sqft_per_gallon` = number of square feet that you can cover with one gallon of paint
- `cost_per_gallon` = cost (in dollars) of one gallon of paint

It should return the cost (in dollars) of putting one coat of paint on all walls and the ceiling.  Assume you can buy the exact amount of paint that you need, so you can buy partial gallons (e.g., if you need 7.523 gallons, you can buy that exact amount, instead of needing to buy 8 gallons and waste some paint).  Do not round your answer.

In [None]:
# TODO: Finish defining the function
def get_cost(sqft_walls, sqft_ceiling, sqft_per_gallon, cost_per_gallon):
    total_sqft = sqft_walls + sqft_ceiling
    gallons_needed = total_sqft / sqft_per_gallon
    cost = cost_per_gallon * gallons_needed
    return cost

Use the `get_cost()` function you defined above to calculate the cost of applying one coat of paint to a room with:
- 432 square feet of walls, and
- 144 square feet of ceiling.

Assume that one gallon of paint covers 400 square feet and costs $15.  As above, assume you can buy partial gallons of paint.  Do not round your answer.

In [None]:
# TODO: Set the project_cost variable to the cost of the project
project_cost = get_cost(432, 144, 400, 15)

print(project_cost)

21.599999999999998


#### üî• Optional Challenge

Now say you can no longer buy fractions of a gallon.  (For instance, if you need 4.3 gallons to do a project, then you have to buy 5 gallons of paint.)

With this new scenario, you will create a new function `get_actual_cost` that uses the same inputs and calculates the cost of your project.

One function that you'll need to use to do this is `math.ceil()`.  We demonstrate usage of this function in the code cell below.  It takes as a number as input and rounds the number up to the nearest integer.  

Run the next code cell to test this function for yourself.  Feel free to change the value of `test_value` and make sure `math.ceil()` returns the number you expect.

In [None]:
# Importing the 'math' submodule
import math

test_value = 2.17

rounded_value = math.ceil(test_value)
print(rounded_value)

3


Use the next code cell to define the function `get_actual_cost()`.  You'll need to use the `math.ceil()` function to do this.

In [None]:
def get_actual_cost(sqft_walls, sqft_ceiling, sqft_per_gallon, cost_per_gallon):
    total_sqft = sqft_walls + sqft_ceiling
    gallons_needed = total_sqft / sqft_per_gallon
    gallons_to_buy = math.ceil(gallons_needed)
    cost = cost_per_gallon * gallons_to_buy
    return cost

Once your function is verified as correct, run the next code cell to calculate the updated cost of your project.

In [None]:
get_actual_cost(432, 144, 400, 15)

Say you're working with a slightly larger room.  Run the next code cell to calculate the cost of the project.

In [None]:
get_actual_cost(594, 288, 400, 15)

## üíø Data Types

Whenever you create a variable in Python, it has a value with a corresponding data type.  There are many different data types, such as integers, floats, booleans, and strings, all of which we'll cover in this part.  (This is just a small subset of the available data types -- there are also dictionaries, sets, lists, tuples, and much more.)

Data types are important, because they determine what kinds of actions you can do with them.  For instance, you can divide two floats, but you cannot divide two strings.  For instance, `12.0/2.0` makes sense, but `"cat"/"dog"` does not.

To avoid errors, we need to make sure that the actions match the data types that we have.

### Integers

Integers are numbers without any fractional part and can be positive (`1`, `2`, `3`, ...), negative (`-1`, `-2`, `-3`, ...), or zero (`0`).    

In the code cell below, we set a variable `x` to an integer.  We then verify the data type with `type()`, and need only pass the variable name into the parentheses.

In [None]:
x = 14
print(x)
print(type(x))

14
<class 'int'>


In the output above, `<class 'int'>` refers to the **int**eger data type.

### Floats

Floats are numbers with fractional parts. They can have many numbers after decimal.

In [None]:
nearly_pi = 3.141592653589793238462643383279502884197169399375105820974944
print(nearly_pi)
print(type(nearly_pi))

3.141592653589793
<class 'float'>


We can also specify a float with a fraction.

In [None]:
almost_pi = 22/7
print(almost_pi)
print(type(almost_pi))

3.142857142857143
<class 'float'>


One function that is particularly useful for fractions is the `round()` function.  It lets you round a number to a specified number of decimal places.  

In [None]:
# Round to 5 decimal places
rounded_pi = round(almost_pi, 5)
print(rounded_pi)
print(type(rounded_pi))

3.14286
<class 'float'>


Whenever you write an number with a decimal point, Python recognizes it as a float data type.  

For instance, `1.` (or `1.0`, `1.00`, etc) will be recognized as a float.  This is the case, even though these numbers technically have no fractional part!

In [None]:
y_float = 1.
print(y_float)
print(type(y_float))

1.0
<class 'float'>


You have seen how to convert a float to an integer with the `int` function.  Try this out yourself by running the code cell below.

In [None]:
# Define a float
y = 1.
print(y)
print(type(y))

# Convert float to integer with the int function
z = int(y)
print(z)
print(type(z))

1.0
<class 'float'>
1
<class 'int'>


In this case, the float you are using has no numbers after the decimal.

**Question**

- But what happens when you try to convert a float with a fractional part to an integer?  
- How does the outcome of the `int` function change for positive and negative numbers?

Use the next code cell to investigate and answer these questions.  Feel free to add or remove any lines of code!

In [None]:
print(int(1.2321))
print(int(1.747))
print(int(-3.94535))
print(int(-2.19774))

1
1
-3
-2


Negative floats are always rounded UP to the closest integer (for instance, both `-1.1` and `-1.9` are rounded up to `-1`). Positive floats are always rounded DOWN to the closest integer (for instance, `2.1` and `2.9` are rounded down to `2`).

### Booleans

Booleans represent one of two values: `True` or `False`.  In the code cell below, `z_one` is set to a boolean with value `True`.

In [None]:
z_one = True
print(z_one)
print(type(z_one))

True
<class 'bool'>


Next, `z_two` is set to a boolean with value `False`.

In [None]:
z_two = False
print(z_two)
print(type(z_two))

False
<class 'bool'>


Booleans are used to represent the truth value of an expression.  Since `1 < 2` is a true statement, `z_three` takes on a value of `True`.

In [None]:
z_three = (1 < 2)
print(z_three)
print(type(z_three))

True
<class 'bool'>


Similarly, since `5 < 3` is a false statement, `z_four` takes on a value of `False`.

In [None]:
z_four = (5 < 3)
print(z_four)
print(type(z_four))

False
<class 'bool'>


We can switch the value of a boolean by using `not`.  So, `not True` is equivalent to `False`, and `not False` becomes `True`.

In [None]:
z_five = not z_four
print(z_five)
print(type(z_five))

True
<class 'bool'>


Booleans will be important in the next part, when you learn about conditions and conditional statements.

**Question**

Now, your goal is to determine what happens when you multiply a boolean by any of these data types.  Specifically,
- What happens when you multiply an integer or float by `True`?  What happens when you multiply them by `False`?  How does the answer change if the numbers are positive or negative?
- What happens when you multiply a string by `True`?  By `False`?

Use the next code cell for your investigation.

In [None]:
print(3 * True)
print(-3.1 * True)
print(type("abc" * False))
print(len("abc" * False))

3
-3.1
<class 'str'>
0


When you multiple an integer or float by a boolean with value `True`, it just returns that same integer or float (and is equivalent to multiplying by 1).  If you multiply an integer or float by a boolean with value `False`, it always returns 0.  This is true for both positive and negative numbers.  If you multiply a string by a boolean with value `True`, it just returns that same string.  And if you multiply a string by a boolean with value `False`, it returns an empty string (or a string with length zero).

### Strings

The string data type is a collection of characters (like alphabet letters, punctuation, numerical digits, or symbols) contained in quotation marks.  Strings are commonly used to represent text.

In [None]:
w = "Hello, Python!"
print(w)
print(type(w))

Hello, Python!
<class 'str'>


You can get the length of a string with `len()`.  `"Hello, Python!"` has length 14, because it has 14 characters, including the space, comma, and exclamation mark.  Note that the quotation marks are not included when calculating the length.

In [None]:
print(len(w))

14


One special type of string is the empty string, which has length zero.

In [None]:
shortest_string = ""
print(type(shortest_string))
print(len(shortest_string))

<class 'str'>
0


If you put a number in quotation marks, it has a string data type.

In [None]:
my_number = "1.12321"
print(my_number)
print(type(my_number))

1.12321
<class 'str'>


If we have a string that is convertible to a float, we can use `float()`.  

This won't always work!  For instance, we can convert `"10.43430"` and `"3"` to floats, but we cannot convert `"Hello, Python!"` to a float.

In [None]:
also_my_number = float(my_number)
print(also_my_number)
print(type(also_my_number))

1.12321
<class 'float'>


Just like you can add two numbers (floats or integers), you can also add two strings.  It results in a longer string that combines the two original strings by concatenating them.

In [None]:
new_string = "abc" + "def"
print(new_string)
print(type(new_string))

abcdef
<class 'str'>


Note that it's not possible to do subtraction or division with two strings.  You also can't multiply two strings, but you can multiply a string by an integer.  This again results in a string that's just the original string concatenated with itself a specified number of times.

In [None]:
newest_string = "abc" * 3
print(newest_string)
print(type(newest_string))

abcabcabc
<class 'str'>


Note that you cannot multiply a string by a float!  Trying to do so will return an error.

In [None]:
will_not_work = "abc" * 3.0

TypeError: can't multiply sequence by non-int of type 'float'

### üëü Exercise

In this question, you will write a function that estimates the value of a house.

Use the next code cell to create a function `get_expected_cost` that takes as input three variables:
- `beds` - number of bedrooms (data type float)
- `baths` - number of bathrooms (data type float)
- `has_basement` - whether or not the house has a basement (data type boolean)

It should return the expected cost of a house with those characteristics. Assume that:
- the expected cost for a house with 0 bedrooms and 0 bathrooms, and no basement is 80000,
- each bedroom adds 30000 to the expected cost,
- each bathroom adds 10000 to the expected cost, and
- a basement adds 40000 to the expected cost.

For instance,
- a house with 1 bedroom, 1 bathroom, and no basement has an expected cost of 80000 + 30000 + 10000 = 120000.  This value will be calculated with `get_expected_cost(1, 1, False)`.
- a house with 2 bedrooms, 1 bathroom, and a basement has an expected cost of 80000 + 2*30000 + 10000 + 40000 = 190000.  This value will be calculated with `get_expected_cost(2, 1, True)`.

In [None]:
# TODO: Complete the function
def get_expected_cost(beds, baths, has_basement):
    value = 80000 + 30000 * beds + 10000 * baths + 40000 * has_basement
    return value

In [None]:
get_expected_cost(1, 1, False)

In [None]:
get_expected_cost(2, 1, True)

#### üî• Optional Challenge

You own an online shop where you sell rings with custom engravings.  You offer both gold plated and solid gold rings.
- Gold plated rings have a base cost of \\$50, and you charge \\$7 per engraved unit.  
- Solid gold rings have a base cost of \\$100, and you charge \\$10 per engraved unit.
- Spaces and punctuation are counted as engraved units.

Write a function `cost_of_project()` that takes two arguments:
- `engraving` - a Python string with the text of the engraving
- `solid_gold` - a Boolean that indicates whether the ring is solid gold

It should return the cost of the project. This question should be fairly challenging.

In [None]:
def cost_of_project(engraving, solid_gold):
    cost = solid_gold * (100 + 10 * len(engraving)) + (not solid_gold) * (50 + 7 * len(engraving))
    return cost

Run the next code cell to calculate the cost of engraving `Charlie+Denver` on a solid gold ring.

In [None]:
project_one = cost_of_project("Charlie+Denver", True)
print(project_one)

240


Use the next code cell to calculate the cost of engraving `08/10/2000` on a gold plated ring.

In [None]:
project_two = cost_of_project("08/10/2000", False)
print(project_two)

120


## ü§î Conditions and Conditional Statements

You have already seen that when you change the input value to a function, you often get a different output.  For instance, consider an `add_five()` function that just adds five to any number and returns the result.  Then `add_five(7)` will return an output of 12 (=7+5), and `add_five(8)` will return an output of 13 (=8+5).  Note that no matter what the input is, the action that the function performs is always the same: it always adds five.

But you might instead need a function that performs an action that depends on the input.  For instance, you might need a function `add_three_or_eight()` that adds three if the input is less than 10, and adds eight if the input is 10 or more.  Then `add_three_or_eight(1)` will return 4 (= 1+3), but `add_three_or_eight(11)` will return 19 (=11+8).  In this case, the action that the function performs varies with the input.

In this part, you will learn how to use conditions and conditional statements to modify how your functions run.

### Conditions

In programming, **conditions** are statements that are either `True` or `False`.  There are many different ways to write conditions in Python, but some of the most common ways of writing conditions just compare two different values.  For instance, you can check if 2 is greater than 3.

In [None]:
print(2 > 3)

False


Python identifies this as False, since 2 is not greater than 3.

You can also use conditions to compare the values of variables.  In the next code cell, `var_one` has a value of 1, and `var_two` has a value of two.  In the conditions, we check if `var_one` is less than 1 (which is `False`), and we check if `var_two` is greater than or equal to `var_one` (which is `True`).

In [None]:
var_one = 1
var_two = 2

print(var_one < 1)
print(var_two >= var_one)

For a list of common symbols you can use to construct conditions, check out the chart below.

<table style="width: 100%;">
<tbody>
<tr><th><b>Symbol</b></th><th><b>Meaning</b></th></tr>
<tr>
<td>==</td>
<td>equals</td>
</tr>
<tr>
<td>!=</td>
<td>does not equal</td>
</tr>
<tr>
<td>&#60;</td>
<td>less than</td>
</tr>
<tr>
<td>&#60;=</td>
<td>less than or equal to</td>
</tr>
<tr>
<td>&#62;</td>
<td>greater than</td>
</tr>
<tr>
<td>&#62;=</td>
<td>greater than or equal to</td>
</tr>
</tbody>
</table>

**Important Note**: When you check two values are equal, make sure you use the == sign, and not the = sign.  
- `var_one==1` checks if the value of `var_one` is 1, but
- `var_one=1` sets the value of `var_one` to 1.

### Conditional statements

**Conditional statements** use conditions to modify how your function runs.  They check the value of a condition, and if the condition evaluates to `True`, then a certain block of code is executed.  (Otherwise, if the condition is `False`, then the code is not run.)  

You will see several examples of this in the following sections.

#### "if" statements

The simplest type of conditional statement is an "if" statement.  You can see an example of this in the `evaluate_temp()` function below.  The function accepts a body temperature (in Celcius) as input.
- Initially, `message` is set to `"Normal temperature"`.  
- Then, if `temp > 38` is `True` (e.g., the body temperature is greater than 38¬∞C), the message is updated to `"Fever!"`.  Otherwise, if `temp > 38` is False, then the message is not updated.
- Finally, `message` is returned by the function.  

In [None]:
def evaluate_temp(temp):
    # Set an initial message
    message = "Normal temperature."
    # Update value of message only if temperature greater than 38
    if temp > 38:
        message = "Fever!"
    return message

In the next code cell, we call the function, where the temperature is 37¬∞C. The message is `"Normal temperature"`, because the temperature is less than 38¬∞C (`temp > 38` evaluates to `False`) in this case.  

In [None]:
print(evaluate_temp(37))

Normal temperature.


However, if the temperature is instead 39¬∞C, since this is greater than 38¬∞C, the message is updated to `"Fever!"`.

In [None]:
print(evaluate_temp(39))

Note that there are two levels of indentation:
- The first level of indentation is because we always need to indent the code block inside a function.
- The second level of indentation is because we also need to indent the code block belonging to the "if" statement.  (As you'll see, we'll also need to indent the code blocks for "elif" and "else" statements.)

Note that because the return statement is not indented under the "if" statement, it is always executed, whether `temp > 38` is `True` or `False`.

#### "if ... else" statements

We can use "else" statements to run code if a statement is False.  The code under the "if" statement is run if the statement is `True`, and the code under "else" is run if the statement is `False`.

In [None]:
def evaluate_temp_with_else(temp):
    if temp > 38:
        message = "Fever!"
    else:
        message = "Normal temperature."
    return message

This `evaluate_temp_with_else()` function has equivalent behavior to the `evaluate_temp()` function.

In the next code cell, we call this new function, where the temperature is 37¬∞C.  In this case, `temp > 38` evaluates to `False`, so the code under the "else" statement is executed, and the `Normal temperature.` message is returned.

In [None]:
print(evaluate_temp_with_else(37))

As with the previous function, we indent the code blocks after the "if" and "else" statements.  

#### "if ... elif ... else" statements

We can use "elif" (which is short for "else if") to check if multiple conditions might be true.  The function below:
- First checks if `temp > 38`.  If this is true, then the message is set to `"Fever!"`.
- As long as the message has not already been set, the function then checks if `temp > 35`.  If this is true, then the message is set to `"Normal temperature."`.
- Then, if still no message has been set, the "else" statement ensures that the message is set to `"Low temperature."` message is printed.

You can think of "elif" as saying ... "okay, that previous condition (e.g., `temp > 38`) was false, so let's check if this new condition (e.g., `temp > 35`) might be true!"

In [None]:
def evaluate_temp_with_elif(temp):
    if temp > 38:
        message = "Fever!"
    elif temp > 35:
        message = "Normal temperature."
    else:
        message = "Low temperature."
    return message

In the code cell below, we run the code under the "elif" statement, because `temp > 38` is `False`, and `temp > 35` is `True`.  Once this code is run, the function skips over the "else" statement and returns the message.

In [None]:
evaluate_temp_with_elif(36)

Finally, we try out a case where the temperature is less than 35¬∞C.  Since the conditionals in the "if" and "elif" statements both evaluate to `False`, the code block inside the "else" statement is executed.

In [None]:
evaluate_temp_with_elif(34)

### Example - Calculations

In the examples so far, conditional statements were used to decide how to set the values of variables.  But you can also use conditional statements to perform different calculations.

In this next example, say you live in a country with only two tax brackets.  Everyone earning less than 12,000 pays 25% in taxes, and anyone earning 12,000 or more pays 30%.  The function below calculates how much tax is owed.

In [None]:
def get_taxes(earnings):
    if earnings < 12000:
        tax_owed = .25 * earnings
    else:
        tax_owed = .30 * earnings
    return tax_owed

The next code cell uses the function.

In [None]:
ana_taxes = get_taxes(9000)
bob_taxes = get_taxes(15000)

print(ana_taxes)
print(bob_taxes)

In each case, we call the `get_taxes()` function and use the value that is returned to set the value of a variable.
- For `ana_taxes`, we calculate taxes owed by a person who earns 9,000.  In this case, we call the `get_taxes()` function with `earnings` set to `9000`.  Thus, `earnings < 12000` is `True`, and `tax_owed` is set to `.25 * 9000`.  Then we return the value of `tax_owed`.
- For `bob_taxes`, we calculate taxes owed by a person who earns 15,000.  In this case, we call the `get_taxes()` function with `earnings` set to `15000`.  Thus, `earnings < 12000` is `False`, and `tax_owed` is set to `.30 * 15000`.  Then we return the value of `tax_owed`.

Before we move on to another example - remember the `add_three_or_eight()` function from the introduction?  It accepts a number as input and adds three if the input is less than 10, and otherwise adds eight.  Can you figure out how you would write this function?  Once you have an answer, click on the "Show hidden code" button below to see the solution.

In [None]:
def add_three_or_eight(number):
    if number < 10:
        result = number + 3
    else:
        result = number + 8
    return result

### Example - Multiple "elif" statements

So far, you have seen "elif" used only once in a function.  But there's no limit to the number of "elif" statements you can use.  For instance, the next block of code calculates the dose of medication (in milliliters) to give to a child, based on weight (in kilograms).

Note: This function should not be used as medical advice, and represents a fake medication.

In [None]:
def get_dose(weight):
    # Dosage is 1.25 ml for anyone under 5.2 kg
    if weight < 5.2:
        dose = 1.25
    elif weight < 7.9:
        dose = 2.5
    elif weight < 10.4:
        dose = 3.75
    elif weight < 15.9:
        dose = 5
    elif weight < 21.2:
        dose = 7.5
    # Dosage is 10 ml for anyone 21.2 kg or over
    else:
        dose = 10
    return dose

The next code cell runs the function.  Make sure that the output makes sense to you!
- In this case, the "if" statement was `False`, and all of the "elif" statements evaluate to `False`, until we get to `weight < 15.9`, which is `True`, and `dose` is set to 5.
- Once an "elif" statement evaluates to `True` and the code block is run, the function skips over all remaining "elif" and "else" statements.  After skipping these, all that is left is the return statement, which returns the value of `dose`.
- The order of the `elif` statements does matter here!  Re-ordering the statements will return a very different result.

In [None]:
print(get_dose(12))

### üëü Exercise

You work at a college admissions office.  When inspecting a dataset of college applicants, you notice that some students have represented their grades with letters (`"A"`, `"B"`, `"C"`, `"D"`, `"F"`), whereas others have represented their grades with a number between 0 and 100.

You realize that for consistency, all of the grades should be formatted in the same way, and you decide to format them all as letters.  For the conversion, you decide to assign:
- `"A"` - any grade 90-100, inclusive
- `"B"` - any grade 80-89, inclusive
- `"C"` - any grade 70-79, inclusive
- `"D"` - any grade 60-69, inclusive
- `"F"` - any grade <60

Write a function `get_grade()` that takes as input:
- `score` - an integer 0-100 corresponding to a numerical grade

It should return a Python string with the letter grade that it corresponds to.  For instance,
- A score of 85 corresponds to a B grade.  In other words, `get_grade(85)` should return `"B"`.
- A score of 49 corresponds to an F grade.  In other words, `get_grade(49)` should return `"F"`.

Make sure that when supplying the grade that is returned by the function, it is enclosed in quotes.  (For instance, if you want to return `"A"`, you should write `return "A"` and not `return A`.)

In [None]:
# TODO: Edit the function to return the correct grade for different scores
def get_grade(score):
    if score >= 90:
        grade = "A"
    elif score >= 80:
        grade = "B"
    elif score >= 70:
        grade = "C"
    elif score >= 60:
        grade = "D"
    else:
        grade = "F"
    return grade

In the exercise for the previous section, you wrote a function `cost_of_project()` that estimated the price of rings for an online shop that sells rings with custom engravings.  This function did not use conditional statements.  In this exercise, you will rewrite the function to use conditional statements.  Recall that the online shop has the following price structure:
- Gold plated rings have a base cost of \\$50, and you charge \\$7 per engraved unit.  
- Solid gold rings have a base cost of \\$100, and you charge \\$10 per engraved unit.
- Spaces and punctuation are counted as engraved units.

Your function `cost_of_project()` takes two arguments:
- `engraving` - a Python string with the text of the engraving
- `solid_gold` - a Boolean that indicates whether the ring is solid gold

It should return the cost of the project.  

The function has been partially completed for you, and you need to fill in the blanks to complete the function.

In [None]:
def cost_of_project(engraving, solid_gold):
    num_units = len(engraving)
    if solid_gold == True:
        cost = 100 + 10 * num_units
    else:
        cost = 50 + 7 * num_units
    return cost

## üìã Lists

When doing data science, you need a way to organize your data so you can work with it efficiently.  Python has many data structures available for holding your data, such as lists, sets, dictionaries, and tuples.  In this section, you will learn how to work with Python lists.

Say you organize the names of the flower species in the data. One way to do this is by organizing the names in a Python string.

In [None]:
flowers = "pink primrose,hard-leaved pocket orchid,canterbury bells,sweet pea,english marigold,tiger lily,moon orchid,bird of paradise,monkshood,globe thistle"

print(type(flowers))
print(flowers)

Even better is to represent the same data in a Python list.  To create a list, you need to use square brackets (`[`, `]`) and separate each item with a comma.  Every item in the list is a Python string, so each is enclosed in quotation marks.

In [None]:
flowers_list = ["pink primrose", "hard-leaved pocket orchid", "canterbury bells", "sweet pea", "english marigold", "tiger lily", "moon orchid", "bird of paradise", "monkshood", "globe thistle"]

print(type(flowers_list))
print(flowers_list)

At first glance, it doesn't look too different, whether you represent the information in a Python string or list.  But as you will see, there are a lot of tasks that you can more easily do with a list.  For instance, a list will make it easier to:
- get an item at a specified position (first, second, third, etc),
- check the number of items, and
- add and remove items.

### Length

We can count the number of entries in any list with `len()`, which is short for "length".  You need only supply the name of the list in the parentheses.

In [None]:
# The list has ten entries
print(len(flowers_list))

### Indexing

We can refer to any item in the list according to its position in the list (first, second, third, etc).  This is called **indexing**.

Note that Python uses zero-based indexing, which means that:
- to pull the first entry in the list, you use 0,
- to pull the second entry in the list, you use 1, and
- to pull the final entry in the list, you use one less than the length of the list.

In [None]:
print("First entry:", flowers_list[0])
print("Second entry:", flowers_list[1])

# The list has length ten, so we refer to final entry with 9
print("Last entry:", flowers_list[9])

**Side Note**: You may have noticed that in the code cell above, we use a single `print()` to print multiple items (both a Python string (like `"First entry:"`) and a value from the list (like `flowers_list[0]`).  To print multiple things in Python with a single command, we need only separate them with a comma.

### Slicing

You can also pull a segment of a list (for instance, the first three entries or the last two entries).  This is called **slicing**.  For instance:
- to pull the first `x` entries, you use `[:x]`, and
- to pull the last `y` entries, you use `[-y:]`.

In [None]:
print("First three entries:", flowers_list[:3])
print("Final two entries:", flowers_list[-2:])

As you can see above, when we slice a list, it returns a new, shortened list.


### Removing items

Remove an item from a list with `.remove()`, and put the item you would like to remove in parentheses.

In [None]:
flowers_list.remove("globe thistle")
print(flowers_list)

### Adding items

Add an item to a list with `.append()`, and put the item you would like to add in parentheses.

In [None]:
flowers_list.append("snapdragon")
print(flowers_list)

### Lists are not just for strings

So far, we have only worked with lists where each item in the list is a string.  But lists can have items with any data type, including booleans, integers, and floats.

As an example, consider hardcover book sales in the first week of April 2000 in a retail store.

In [None]:
hardcover_sales = [139, 128, 172, 139, 191, 168, 170]

Here, `hardcover_sales` is a list of integers.  Similar to when working with strings, you can still do things like get the length, pull individual entries, and extend the list.

In [None]:
print("Length of the list:", len(hardcover_sales))
print("Entry at index 2:", hardcover_sales[2])

You can also get the minimum with `min()` and the maximum with `max()`.

In [None]:
print("Minimum:", min(hardcover_sales))
print("Maximum:", max(hardcover_sales))

To add every item in the list, use `sum()`.

In [None]:
print("Total books sold in one week:", sum(hardcover_sales))

We can also do similar calculations with slices of the list.  In the next code cell, we take the sum from the first five days (`sum(hardcover_sales[:5])`), and then divide by five to get the average number of books sold in the first five days.

In [None]:
print("Average books sold in first five days:", sum(hardcover_sales[:5])/5)

### üëü Exercise

The list `num_customers` contains the number of customers who came into your restaurant every day over the last month (which lasted thirty days).  Fill in values for each of the following:
- `avg_first_seven` - average number of customers who visited in the first seven days
- `avg_last_seven` - average number of customers who visited in the last seven days
- `max_month` - number of customers on the day that got the most customers in the last month
- `min_month` - number of customers on the day that got the least customers in the last month

Answer this question by writing code.  For instance, if you have to find the minimum value in a list, use `min()` instead of scanning for the smallest value and directly filling in a number.

In [None]:
# Do not change: Number of customers each day for the last month
num_customers = [137, 147, 135, 128, 170, 174, 165, 146, 126, 159,
                 141, 148, 132, 147, 168, 153, 170, 161, 148, 152,
                 141, 151, 131, 149, 164, 163, 143, 143, 166, 171]

# TODO: Fill in values for the variables below
avg_first_seven = sum(num_customers[:7])/7
avg_last_seven = sum(num_customers[-7:])/7
max_month = max(num_customers)
min_month = min(num_customers)

In the part, you'll learn all about **list comprehensions**, which allow you to create a list based on the values in another list.  In this question, you'll get a brief preview of how they work.

Say we're working with the list below.

In [None]:
test_ratings = [1, 2, 3, 4, 5]

Then we can use this list (`test_ratings`) to create a new list (`test_liked`) where each item has been turned into a boolean, depending on whether or not the item is greater than or equal to four.

In [None]:
test_liked = [i>=4 for i in test_ratings]
print(test_liked)

In this question, you'll use this list comprehension to define a function `percentage_liked()` that takes one argument as input:
- `ratings`: list of ratings that people gave to a movie, where each rating is a number between 1-5, inclusive

We say someone liked the movie, if they gave a rating of either 4 or 5.  Your function should return the percentage of people who liked the movie.

For instance, if we supply a value of `[1, 2, 3, 4, 5, 4, 5, 1]`, then 50% (4/8) of the people liked the movie, and the function should return `0.5`.

Part of the function has already been completed for you.  You need only use `list_liked` to calculate `percentage_liked`.

In [None]:
def percentage_liked(ratings):
    list_liked = [i >= 4 for i in ratings]
    # TODO: Complete the function
    percentage_liked = sum(list_liked)/len(list_liked)
    return percentage_liked

# Do not change: should return 0.5
percentage_liked([1, 2, 3, 4, 5, 4, 5, 1])