# Python Basics I

In this notebook, we will work with the following:

- Python basic data types.
- Python data structures.

In [1]:
# imports
import datetime

import pandas as pd

# Python

Below, we will cover:

- Basic data types: `int`, `float`,`str`.
- Data structures: lists and dictionaries.
- Functions and methods.
- Mutability.


## Preliminary items

By tradition, the first programming task for new learners is to print "Hello, World!" to the default output.
So, in the cell below, type in `print('Hello, World!')` and run the cell, either by pressing Shift-Enter or by clicking the run button (looks like the "play" symbol) in the toolbar.

In [2]:
# Type your "Hello, World" print statement below.


## Basic data types

### Integers

In [3]:
# To create an int, we can just assign a number without a decimal.
a = 2
b = 3

In [4]:
# Most operations preserve int types
print(f'Addition:        {a + b}')
print(f'Subtraction:    {a - b}')
print(f'Multiplication:  {a * b}')
print(f'Exponentiation:  {a ** b}')

Addition:        5
Subtraction:    -1
Multiplication:  6
Exponentiation:  8


In [5]:
# Division does not, but we can use floor division if that's what we want.
print(f'Division:           {b / a}')
print(f'Floor/int Division: {b // a}')

Division:           1.5
Floor/int Division: 1


In [6]:
# Printing types
print(f'Division:           {type(b / a)}')
print(f'Floor/int Division: {type(b // a)}')

Division:           <class 'float'>
Floor/int Division: <class 'int'>


### Floats

Anything with a float returns a float.
Also, in the example below, note the imprecision we talked about earlier.

In [7]:
c = 4
d = 1.1

print(f'Addition:        {c + d}')
print(f'Subtraction:     {c - d}')
print(f'Even here:       {4.1 - d}')

Addition:        5.1
Subtraction:     2.9
Even here:       2.9999999999999996


In [8]:
# Scientific notation works, too.
e = 1.23456e02
print(f'e:        {e}')
print(f'Too far: {-1.23456e1000}')


e:        123.456
Too far: -inf


### Strings

We denote a string with single `'` or double `"` quote characters.

In [9]:
a_string = 'Hello!'
b_string = "Good way to contain 'single' quotes."

In [10]:
print(a_string)
print(b_string)

Hello!
Good way to contain 'single' quotes.


Above, I used a new Python 3.6 feature called f strings.
This kind of string can contain code that is evaluated when the string is created, and it is very helpful for generating messages or other output.

In [11]:
# Replace the following with your name.
my_name = 'Jason'
print(f'My name is: {my_name}')

My name is: Jason


A common data handling issue is that we have something stored as strings that are actually numeric.
We can convert these using the appropriate functions.
We will see later that pandas can help us with this inside dataframes.

In [12]:
f = '6'
g = '1.234e4'
print(int(f))
print(float(g))

6
12340.0


## Boolean values

Python has special values `True` and `False` that are boolean.
They can be used directly, and they are also returned by comparisons.
As we will see later, they are useful for conditional logic.

In [13]:
h = True
i = False

In [14]:
# We can test values.
h is False

False

In [15]:
i is False

True

In [16]:
# We can use logical operators, too.
h or i

True

In [17]:
h and i

False

In [18]:
# Exclusive OR operator.
h ^ i

True

In [19]:
# Underneath, these are actually just the ints 0 and 1.
# I wouldn't make a habit of using them this way.
h + 1

2

## None type

Python has a type called `None` that is used for null values.
Often, we want to test whether something exists before trying to work with it, as a way of avoiding errors.

In [20]:
j = None

In [21]:
h is None

False

In [22]:
j is None

True

## Datetime

Dates and times are a big topic, and we are only scratching the surface.
See the [documentation](https://docs.python.org/3/library/datetime.html) for more.
Also, we will often with with dates in pandas dataframes, and it has datetime tools, too.

In [23]:
a_datetime = datetime.datetime.now()
print(a_datetime)

2020-05-28 13:55:57.883677


In [24]:
# We can get components.
print(a_datetime.year)
print(a_datetime.month)
print(a_datetime.day)

2020
5
28


In [25]:
# Using timedelta, we can add or subtract units of time.
print(a_datetime - datetime.timedelta(days=365))

2019-05-29 13:55:57.883677


# Data structures

These are Python's built in data structures.
They are simple, yet powerful, and they have pretty good performance.
However, for things like dataframes, we use something designed for that use, like pandas.

## Lists

We're just scratching the surface with lists, but the [documentation](https://docs.python.org/3/tutorial/introduction.html#lists) is great.

In [26]:
a_list = ['Lists', 'are', 'quite', 'helpful']
print(a_list)

['Lists', 'are', 'quite', 'helpful']


In [27]:
# We can mix types, though we often don't.
b_list = [
    1,
    'two',
    3.0,
    4
]
print(b_list)

[1, 'two', 3.0, 4]


In [28]:
# Access with indices and slicing
print(b_list[0])
print(a_list[:2])

1
['Lists', 'are']


## Dictionaries

Like lists, see the [documentation](https://docs.python.org/3/tutorial/datastructures.html#dictionaries).
There are some common things, like the pandas [rename method](https://pandas.pydata.org/pandas-docs/stable/basics.html#basics-rename), that take dictionaries.

In [29]:
a_dict = {'A': 1, 'B': 2, 'C': 3}
print(a_dict)

{'A': 1, 'B': 2, 'C': 3}


In [30]:
# Access by key.
a_dict['B']

2

In [31]:
# We can add new keys after creating the dictionary.
a_dict['D'] = 4
print(a_dict)

{'A': 1, 'B': 2, 'C': 3, 'D': 4}


In [32]:
# We can also combine dictionaries using the update method.
b_dict = {'E': 5, 'F': 6}
print(b_dict)
a_dict.update(b_dict)
print(a_dict)

{'E': 5, 'F': 6}
{'A': 1, 'B': 2, 'C': 3, 'D': 4, 'E': 5, 'F': 6}


## Nested data structures

Data structures can also contain other data structures (and potentially multiple levels deep).
One common use for us is to represent a row of data as a dictionary, where the keys can be thought of as column names and the values are the values of variables.
Then, we put those dicts into a list where each dict is an element of the list.
Afterward, we can pass this structure to pandas (which we cover later), and it will make a nice dataframe for us.

In [33]:
a_dict_list = [
    {
        'ticker': 'msft',
        'founding_year': 1975
    },
    {
        'ticker': 'aapl',
        'founding_year': 1976
    }
]
print(a_dict_list)

[{'ticker': 'msft', 'founding_year': 1975}, {'ticker': 'aapl', 'founding_year': 1976}]


In [34]:
pd.DataFrame(a_dict_list)

Unnamed: 0,ticker,founding_year
0,msft,1975
1,aapl,1976


# Breakout Exercises

Let's do a few exercises to reinforce the concepts we learned above.


1. ints and floats
1. strings
1. dictionaries

## EX1: ints and floats

This is fairly strightforward, so let's calculate two things.

1. Subtract the number of doctoral students in your breakout room from the total number of participants.
1. Compute the proportion of doctoral students to total participants in your breakout room.

In [35]:
# 1-1 code


In [36]:
# 1-2 code


## EX2: strings

1. Create a string containing `Hello, World!` and assign it to the name `z_string`.
1. Take `z_string` and add (using `+`) the string ` I'm [yourname].` to it (filling in your name).
1. Display the original string, verifying that the last step didn't change it.

In [37]:
# 2-1 code


In [38]:
# 2-2 code


In [39]:
# 2-3 code


## EX3: dictionaries

Dictionaries are usefful for a number of things, including as a lightweight way to represent a row of data.
When we use them this way, the keys are like column names, and the values are like the actual data.

1. Assign `z_dict` to a dictionary with the key `'name'` and a value of the name of the breakout room member who is screen sharing.
1. Update the dictionary with another key `'affil'` and a value of the affiliation of the breakout room member who is screen sharing.
1. Use the indexing syntax to retrieve the value corresponding with the key `'affil'`.

In [40]:
# 3-1 code


In [41]:
# 3-2 code


In [42]:
# 3-3 code


# Bonus topics

We won't cover these directly, but here are two other topics that are good to know.

1. **Reading and writing to files.** See the relevant Python [tutorial](https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files). Two things to note:
    1. For most things we do, we don't need to be this low level. For example, pandas will easily read and write structured data like CSV, Stata, SAS. You should always prefer the higher-level tools if they can do what you want.
    1. I usually use this for either reading a list of identifiers that I created at some other time (e.g., the gvkeys of all S&P 500 firms) or for recording which observations that I have already processed (e.g., when web scraping).
1. **Handling errors.** See the relevant Python [tutorial](https://docs.python.org/3/tutorial/errors.html#handling-exceptions) on `try-except` clauses (which will handle most of what we tend to need in research). In particular, note what they say about using `except` without a specified exception (in other resources, this is often called a "naked" `except`). This will capture anything, including the exception that is raised when you use the Kernel:Interrupt Kernel menu item, and that isn't usually what you want.