# <font color = firebrick>Tutorial 2: Python data</font> <a id='home'></a>

Before we start to work with data, we need to learn the basics of Python. The objective is to learn enough to be able to do interesting work with data without fully knowing Python.

In this notebook, we will focus on the following points:
1. [Assignment](#assignment)
2. [Calculation](#calculation)
3. [Help](#help)
4. [Strings](#strings)
5. [Formatted printing](#print)

# 1. Variable assignment<a id="assignment"></a> ([top](#home))
Run the following code block (Sift+Enter). What does the code do?

In [None]:
# Create and assign the value 10 to the variable x
x = 10 # Indeed, we've created and assigned the value 10 to the variable x

# Print the value of x as an output
print(x)

Let's try to understand the code above by reading line by line.
- __line 1:__ The pound sign (\#) in code indicates a comment. The Python interpreter ignores this comment line. The comment can appear at the beginning of a line or after a line of code.
- __ligne 2:__ The variable ``x``  is created with the *value* 23. 
- __line 5:__ The Python interpreter prints the value of the ``x`` variable, i.e. ``23``. The ``print()`` function is a built-in Python function. More on this later.

In [None]:
# Variable names generally need to be self-explanatory in order to be readable.
my_followers = 4000   # Maybe I'm a little younger or a little older!
print(my_followers)

The underscore symbol in the variable name makes it more readable. Variable names must start with a letter or an underscore, and can include letters, numbers, and underscores. Variable names are case sensitive. 

**Task 1:** Before running the code below, try to guess the output.

In [None]:
myFollowers = 500     # My current social media followers
myfollowers = 1500    # After a successful marketing campaign
MYFOLLOWERS = 10000   # With a viral post, I reached a new milestone
my_followers_2 = 100  # When I first started

print(myFollowers)
print(myfollowers, MYFOLLOWERS)
print(my_followers, my_followers_2, myFollowers)

Multiple variables can be printed using the same print statement. Note that the `my_platform_age` variable in the previous code cell is still available in the later code cells. What happens when you execute the following codes?

In [None]:
2_my_followers = 5 

In [None]:
# Assign the value in my_age_2 to another variable
my_followers_3 = my_followers_2
print(my_followers_2, my_followers_3)

my_followers_2 = 17
print(my_followers_2, my_followers_3)

# 2. Calculation<a id="calculation"></a> ([top](#home))
Computer calculators can very powerful. Run the cell below and see what happens.

In [None]:
# Multiplication of user engagement metrics
user_growth_rate = 2 * 3
active_users = 2 * 3   # Does whitespace matter for this calculation?
monthly_engagement = 2     *     3     

print('Here are the user growth rate, active users, and monthly engagement:')
print(user_growth_rate, active_users, monthly_engagement)

# Multiply two key metrics
total_engagement = user_growth_rate * active_users
print('What is the total engagement (growth rate x active users)?')
print(total_engagement)

The `print()` function is used here to directly print messages that are enclosed in single quotation marks. A string variable is printed. More on this later.

In [None]:
# Division of revenue metrics
average_revenue_per_user = 10 / 2
ad_revenue_per_click = 20 / 5
subscription_rate = 6 / 4

print('Here are the average revenue per user, ad revenue per click, and subscription rate:')
print(average_revenue_per_user, ad_revenue_per_click, subscription_rate)

Using division results in answers displayed with decimal points, which was not the case with multiplication. Python creates variables of different types. 

**Every object in Python has a type:** integer, float, string, list, etc., which we'll introduce in the following. So far, we've mainly used integers. Here, dividing the integer 10 by 2 creates the float 5.0. The type of a variable can be accessed with the function “type()”.

In [None]:
print('The variable average_revenue_per_user has the value:', average_revenue_per_user,
      'and the type:', type(average_revenue_per_user))

print('\n') # line break

print('The variable user_growth_rate has the value:', user_growth_rate,
      'and the type:', type(user_growth_rate))

In [None]:
# Exponential growth in user base (note that it is not ^ but **)
initial_users = 2**3  # Represents growth over 3 time periods

print('This is the initial user base after exponential growth:', initial_users)

In [None]:
# Natural logarithm often model diminishing returns or growth rates.
digital_users = 10
log_users = log(digital_users)

print('The natural log of the number of users is:', log_users)

What happened? Natural logarithm is not one of Python's pre-built functions... but we can add packages with new functions to Python! The numpy package includes many numerical functions, including log. To use the functions in a package, you must first import it.

In [None]:
# Import the numpy package and give it the shorter name np.
import numpy as np

# To use a function from the numpy package, we use the 'dot' syntax
log_users = np.log(digital_users)
print(log_users)

# The opposite of the natural log is the exponential function
should_be_log_users = np.exp(log_users)
print(should_be_log_users)

No need to worry about packages and the "dot" syntax at the moment. The most important thing is to understand that there are **many functions in countless packages**.

The 4 short problems in Internet economics below will take you a few minutes to complete, and will allow you to test your knowledge of Python.

## <font color='firebrick'> Practice</font>

1. Suppose you lend $300 to a small digital startup for one year at a 5\% interest rate. Calculate the repayment amount? For this, you will create the variable ``principal`` and set it to 300 and the variable `i` and set it to 0.05. Create a variable named `payoff` to hold the payoff amount. Print the value of the payoff.

2. Consider the growth of an online user base for a digital platform with 5 users in the first month and one more the next month.

In a code cell, enter:

```python
users = 5
users = users + 1
```

and run the cell. What is the new value of users?

In [None]:
users = 5
users = users + 1

3. Let's continue with the user base example for a digital platform. Suppose the platform adds one more user each month. In the code cell below, enter:
```python
users=users+1
print(users)
``` 
and run the code. What happened? 

In [None]:
users=users+1
print(users)

Rerun the previous cell. What is the value of `users`? Rerun the cell again. And again. What is happening? 

In the cell below, change the format of `users` with the following code: 
```python
print('Unformated users:', users)
print('Formated users: %6.2e'% (users)
```

Fix the error in the code above by adding a closing bracket. Run the code.

The second line changes the format of the printed output. The '6' says 'allocate 6 places for the printout'. The '2' says 'print two decimal points.' The 'f' says 'present the information as fixed' (rather than, say, exponential) format.

4. In a code cell, set `downloads_morning = 200` and `downloads_afternoon = 300`. Write some code that swaps the values of `downloads_morning` and `downloads_afternoon`.  

# 3. Help<a id="help"></a> ([top](#home))
The Jupyter Notebook has an easy way to get help about an object. In a code cell below, enter `print?` to learn about the print function. 

In [None]:
print?

Now try `users?`

In [None]:
users?

In general, we can use the `?` to learn about any object in our programs.

# 4. Strings <a id="strings"></a> ([top](#home))
Strings are sets of characters. Python allows to manipulate strings very easily, which is not the case with languages such as MATLAB and STATA. In Python, strings are enclosed in quotation marks to assign them to variables. You can use either single or double quotation marks

These are all legitimate strings:

```python
name = 'Jeff Bezos'
company = "Amazon"
zip_code = '98109'
```

Notice that `zip_code` looks like an integer, but... it's not! Enter these three strings in a code window. Then try `z = zip_code/3`.

In [None]:
print(type(zip_code))
print(zip_code)

We asked Python to divide a string by an integer... which it can't do.

However, Python does know how to do small “calculations” with strings. Open a code cell and try this:
```python
first = 'Sundar'
last = 'Pichai'
name = first + last
print(name)
```

What does `first*2` do? What about `2*first`?

In [None]:
print(first*2)

In [None]:
print(2*first)

### Quotation marks

Single and double quotation marks can be used without distinction. The instructions `first_name = 'Jeff'` and `first_name = “ Jeff”` do the same thing. 

Nevertheless, double quotation marks are needed when the string contains a single quotation mark. The instruction 
```python
definition = "Digital economy pure players don't have bricks-and-mortar retail space" 
```

works because the Python interpreter knows that the string is contained in double quotes, so it treats the single quote as a character. In a code cell, try: 
```python
definition = "They don't have a physical store"
```
and
```python
definition = 'They don't have a physical store'
```


Finally, triple quotes (with single or double characters) are useful for creating strings that span several lines:
```
Zuboff_quote = """
Surveillance capitalism unilaterally claims human experience as free raw material for translation 
into behavioral data.
These data are then computed and packaged as prediction products for sale.

Shoshana Zuboff, The Age of Surveillance Capitalism
"""
```


In [None]:
Zuboff_quote = """
Surveillance capitalism unilaterally claims human experience as free raw material for translation 
into behavioral data.
These data are then computed and packaged as prediction products for sale.

Shoshana Zuboff, The Age of Surveillance Capitalism
"""
print(Zuboff_quote)

A second use of triple quotes is to create long comments. We have been using `#` to create comments in our code, but we can also triple quotes. You will often see a triple quote at the begining of a program. 

```
"""
The is from Shoshana Zuboff, known for her work on the digital economy and surveillance capitalism.
"""
```

This piece of code won't do anything. It's for humans only, and Python won't interpret it.

## <font color='firebrick'> Practice</font>

1. In which of the following cases is x a string?
Edit this markdown cell and type 'string' or 'not string' next to each example.

 - x = '80 000' → string or not string?
 - x = 80000 → string or not string?
 - x = "Hello World" → string or not string?
 - x = 'ThreeRedLions' → string or not string?
 - x = 4.5 → string or not string?
 - x = 'firstname.lastname@u-picardie.fr' → string or not string?

2. Fix this expression so it correctly handles the apostrophe in the string:

In [None]:
whose_platform = 'Mark Zuckerberg's'

**Hint:** You’ll need to escape the apostrophe or use a different type of quote.

To check your solutions, try running the following code examples:

In [None]:
# Example for question 1
x1 = '80 000'
x2 = 80000
x3 = "Hello World"
x4 = 'ThreeRedLions'
x5 = 4.5
x6 = 'firstname.lastname@u-picardie.fr'

print(type(x1), type(x2), type(x3), type(x4), type(x5), type(x6))

# Example for question 2: Fix the apostrophe
whose_platform = "Mark Zuckerberg's"
print(whose_platform)

3. In the first and last name example, the result was `SundarPichai`. Use the cell below to correct the code to show a space between the first and last name: `Sundar Pichar`.

4. The `\n' character is like the 'Enter' key on a word processor. Copy the following in the cell below and insert `\n' to obtain the same line break:
```
Larry Page & Sergey Brin

Sundar Pichai
```

In [None]:
print("Larry Page & Sergey Brin \n Sundar Pichai")

# 5. Formatted printing<a id="print"></a>  [(top)](#home)

So far, we've printed simple output to the screen, but let's explore how to make these statements more flexible and presentable.

Consider an example where we want to say that the revenue of a GAFA firm (e.g., Google) in 2010 and 2020 was \\$29.32 billion and \\$182.53 billion, respectively. Until now, we might write this like so:

In [None]:
print("Google’s revenue in 2010 and 2020 was 29.32 and 182.53 billion dollars, respectively")

However, if these numbers were dynamically generated by a program, we would need to automate this process. Thankfully, we can do this easily using Python's `format` method.

In [None]:
print('Google’s revenue in 2010 and 2020 was {0:6.2f} and {1:6.2f} billion dollars, respectively'.format(29.32, 182.53))

There’s a lot happening here.

Our string contains two format codes (fields to be replaced) embedded in the text. The fields to be replaced are marked with curly braces {}. The values are substituted according to the rules we've set. If you want to print a curly brace itself, simply double it: {{ or }}.

A positional parameter in the format method can be accessed by its index inside the curly braces. For example, {0} accesses the first parameter, {1} the second. After the index, a colon is followed by the format string.

The format string follows this structure:

```
%[width][.precision]type 
```

Let’s make sense of this using the example of Google’s 2010 revenue of \\$29.32 billion. We used `%6.2f`. Here's the breakdown:
* %: Introduces the format specifier.
* 6: Total width allocated for the number, including the decimal point.
* 2: Precision, i.e., two decimal places.
* f: Indicates it's a floating-point number.

These values are passed in the .format() method. If you want to switch their order, you can do this too:

In [None]:
print('Google’s revenue in 2010 and 2020 was {1:6.2f} and {0:6.2f} billion dollars, respectively'.format(182.53, 29.32))

Alternatively, to avoid managing the order of arguments, we can use keyword arguments:

In [None]:
print('Google’s revenue in 2010 and 2020 was {a:6.2f} and {b:6.2f} billion dollars, respectively'.format(a=29.32, b=182.53))

Here, `a` and `b` are placeholders for our variables. We could choose any names, such as `rev_2010` and `rev_2020`:

In [None]:
print('Google’s revenue in 2010 and 2020 was {rev_2010:6.2f} and {rev_2020:6.2f} billion dollars, respectively'.format(rev_2010=29.32, rev_2020=182.53))

A simpler way, introduced in Python 3.6, is to use f-string literals, which make formatting more concise. Here’s the equivalent using f-strings:

In [None]:
rev_2010 = 29.32
rev_2020 = 182.53
print(f"Google’s revenue in 2010 and 2020 was {rev_2010:6.2f} and {rev_2020:6.2f} billion dollars, respectively")

This method is often easier to read and requires less code, making it more suitable for interactive or dynamic reports in digital economics.

## <font color='firebrick'>Practice</font>


Here’s an adaptation of the questions based on your digital economics example:

1. Change the decimal places to 2 for 2010 Google revenue.

You can adjust the decimal precision by modifying the format specifier to have 2 decimal places. Here's an example:

In [None]:
rev_2010 = 29.32
rev_2020 = 182.53
print(f"Google's revenue in 2010 and 2020 was {rev_2010:6.2f} and {rev_2020:6.2f} billion dollars, respectively")

2. There's an extra space between "was" and the 2010 revenue number. Fix it.

Remove the space by adjusting the formatting or simply removing extra spaces in the string:

In [None]:
print(f"Google's revenue in 2010 and 2020 was {rev_2010:.2f} and {rev_2020:.2f} billion dollars, respectively")

3. We can also tell Python to add commas as a thousands separator by adding a comma in the format string. Print the statement again using the comma separator in billions of dollars with two decimal places.

Use the comma format specifier to add the thousands separator:

In [None]:
rev_2010 = 29320  # Revenue in millions
rev_2020 = 182530  # Revenue in millions
print(f"Google's revenue in 2010 and 2020 was {rev_2010:,.2f} and {rev_2020:,.2f} million dollars, respectively")

This will display the numbers with commas as thousands separators and limit the decimal places to 2. You can scale the units based on your preference (e.g., billions or millions).