# Python data types and basic syntax

This is a quick overview and reference for many of the fundamentals that we might not have time to stop and go through in detail:

- [Basic data types](#Basic-data-types)
- [Variable assignment](#Variable-assignment)
- [String methods](#String-methods)
- [Comments](#Comments)
- [The print() function](#The-print()-function)
- [Doing math in Python](#Doing-math-in-Python)
- [Collections of data](#Collections-of-data)
    - [Lists](#Lists)
    - [Dictionaries](#Dictionaries)
- [`for` loops](#for-loops)
- [`if` statements](#if-statements)
- [Dealing with errors](#Dealing-with-errors)

### Basic data types
Just like Excel and other data processing software, Python recognizes a variety of data types, including three we'll focus on here:
- Strings (text)
- Numbers (integers, numbers with decimals and more)
- Booleans (`True` and `False`).

You can use the [`type()`](https://docs.python.org/3/library/functions.html#type) function to check the data type of a value.

#### Strings

A string is a group of characters -- letters, numbers, whatever -- enclosed within single or double quotes (doesn't matter as long as they match). The code in these notebooks uses single quotes. (The Python style guide doesn't recommend one over the other: ["Pick a rule and stick to it."](https://www.python.org/dev/peps/pep-0008/#string-quotes))

If your string _contains_ apostrophes or quotes, you have two options: _Escape_ the offending character with a forward slash `\`:

```python
'Isn\'t it nice here?'
```

... or change the surrounding punctuation:

```python
"Isn't it nice here?"
```

The style guide recommends the latter over the former.

When you check the `type()` of a string, Python will return `str`.

Calling [`str()`](https://docs.python.org/3/library/stdtypes.html#str) on a value will return the string version of that value (see example below).

In [20]:
'Investigative Reporters and Editors'

'Investigative Reporters and Editors'

In [23]:
type('hello!')

str

In [21]:
'45'

'45'

In [24]:
str(45)

'45'

In [22]:
'True'

'True'

If you "add" strings together with a plus sign `+`, it will concatenate them:

In [71]:
'IRE' + '/' + 'NICAR'

'IRE/NICAR'

#### Numbers

Python recognizes a variety of numeric data types. Two of the most common are integers (whole numbers) and floats (numbers with decimals).

Calling `int()` on a piece of numeric data will attempt to coerce it to an integer; calling `float()` will try to convert it to a float.

In [25]:
12

12

In [29]:
12.4

12.4

In [30]:
type(12)

int

In [31]:
type(12.4)

float

In [32]:
int(35.6)

35

In [33]:
float(46)

46.0

#### Booleans

Just like in Excel, which has `TRUE` and `FALSE` data types, Python has boolean data types. They are `True` and `False` -- note that only the first letter is capitalized.

Boolean values are typically returned when you're evaluating a logical statement.

In [34]:
True

True

In [35]:
False

False

In [36]:
4 > 6

False

In [37]:
'ell' in 'Hello'

True

In [38]:
type(True)

bool

### Variable assignment

The `=` sign assigns a value to a variable name that you choose. Later, you can retrieve that value by referencing its variable name. Variable names can be pretty much anything you want ([as long as you follow some basic rules](https://thehelloworldprogram.com/python/python-variable-assignment-statements-rules-conventions-naming/)).

In a Jupyter notebook, any value assigned to a variable will be available once you _run_ the cell. Otherwise it won't be available.

This can be a tricky concept at first! For more detail, [here's a pretty good explainer from Digital Ocean](https://www.digitalocean.com/community/tutorials/how-to-use-variables-in-python-3).

In [6]:
my_name = 'Cody'

In [39]:
my_name

'Cody'

You can also _reassign_ a different value to a variable name, though it's (usually) better practice to create a new variable.

In [40]:
my_name = 'Jacob'

In [41]:
my_name

'Jacob'

### String methods

Let's go back to strings for a second. String objects have a number of useful [methods](https://docs.python.org/3/library/stdtypes.html#string-methods) -- let's use an example string to demonstrate a few common ones.

In [56]:
my_cool_string = '    Hello, Pittsburgh!'

`upper()` converts the string to uppercase:

In [57]:
my_cool_string.upper()

'    HELLO, PITTSBURGH!'

`lower()` converts to lowercase:

In [58]:
my_cool_string.lower()

'    hello, pittsburgh!'

`replace()` will replace a piece of text with other text that you specify:

In [68]:
my_cool_string.replace('Pitt', 'Frick')

'    Hello, Fricksburgh!'

`count()` will count the number of occurrences of a character or group of characters: 

In [73]:
my_cool_string.count('H')

1

Note that `count()` is case-sensitive. If your task is "count all the h's," convert your original string to upper or lowercase first:

In [74]:
my_cool_string.upper().count('H')

2

[`split()`](https://docs.python.org/3/library/stdtypes.html#str.split) will split the string into a [_list_](#Lists) (more on these in a second) on a given delimiter (if you don't specify a delimiter, it'll default to splitting on a space):

In [59]:
my_cool_string.split()

['Hello,', 'Pittsburgh!']

In [60]:
my_cool_string.split(',')

['    Hello', ' Pittsburgh!']

In [61]:
my_cool_string.split('Pitt')

['    Hello, ', 'sburgh!']

`strip()` removes whitespace from either side of your string (but not internal whitespace):

In [62]:
my_cool_string.strip()

'Hello, Pittsburgh!'

You can use a cool thing called "method chaining" to combine methods -- just tack 'em onto the end. Let's say we wanted to strip whitespace from our string _and_ make it uppercase:

In [63]:
my_cool_string.strip().upper()

'HELLO, PITTSBURGH!'

Notice, however, that our original string is unchanged:

In [64]:
my_cool_string

'    Hello, Pittsburgh!'

Why? Because we haven't assigned the results of anything we've done to a variable. A common thing to do, especially when you're cleaning data, would be to assign the results to a new variable:

In [65]:
my_cool_string_clean = my_cool_string.strip().upper()

In [66]:
my_cool_string_clean

'HELLO, PITTSBURGH!'

### Comments
A line with a comment -- a note that you don't want Python to interpret -- starts with a `#` sign. These are notes to collaborators and to your future self about what's happening at this point in your script, and why.

Typically you'd put this on the line right above the line of code you're commenting on:

In [69]:
# coercing this to an int because we don't need any decimal precision
avg_settlement = 40827348.34328237
int(avg_settlement)

40827348

Multi-line comments are sandwiched between triple quotes (or triple apostrophes):

`'''
this
is a long
comment
'''`

or

`"""
this
is a long
comment
"""`

Here's a real-live comment I used in a script:

In [None]:
'''
Given a price, a base year index and the current year index, this will return the adjusted value
   See: https://www.bls.gov/cpi/factsheets/cpi-math-calculations.pdf#page=2
   Ctrl+F for "constant dollars"
'''

### The `print()` function

So far, we've just been running the notebook cells to get the last value returned by the code we write. Using the [`print()`](https://docs.python.org/3/library/functions.html#print) function is a way to print specific things in your script to the screen.

To print multiple things on the same line, separate them with a comma.

In [72]:
print('Hello!')
print(my_name)
print('Hello,', my_name)

Hello!
Jacob
Hello, Jacob


### Doing math in Python

You can do [basic math](https://www.digitalocean.com/community/tutorials/how-to-do-math-in-python-3-with-operators) in Python. You can also do [more advanced math](https://docs.python.org/3/library/math.html).

In [75]:
4+2

6

In [76]:
10-9

1

In [77]:
5*10

50

In [78]:
1000/10

100.0

In [79]:
# ** raises a number to the power of another number
5**2

25

In [84]:
# % returns the remainder of a division problem
100 % 8

4

In [85]:
# divmod() returns the quotient ~and~ the remainder
divmod(100, 8)

(12, 4)

## Collections of data

Now we're going to talk about two ways you can use Python to group data into a collection: lists and dictionaries.

### Lists

A _list_ is a comma-separated list of items inside square brackets: `[]`.

Here's a list of ingredients, each one a string, that together makes up a salsa recipe.

In [86]:
salsa_ingredients = ['tomato', 'onion', 'jalapeño', 'lime', 'cilantro']

To get an item out of a list, you'd refer to its numerical position in the list -- its _index_ (1, 2, 3, etc.) -- inside square brackets immediately following your reference to that list. In Python, as in many other programming languages, counting starts at 0. That means the first item in a list is item `0`.

In [87]:
salsa_ingredients[0]

'tomato'

In [88]:
salsa_ingredients[1]

'onion'

You can use _negative indexing_ to grab things from the right-hand side of the list -- and in fact, `[-1]` is a common idiom for getting "the last item in a list" when it's not clear how many items are in your list.

In [89]:
salsa_ingredients[-1]

'cilantro'

If you wanted to get a slice of multiple items out of your list, you'd use colons (just like in Excel, kind of!).

If you wanted to get the first three items, you'd do this:

In [91]:
salsa_ingredients[0:3]

['tomato', 'onion', 'jalapeño']

You could also have left off the initial 0 -- when you leave out the first number, Python defaults to "the first item in the list." In the same way, if you leave off the last number, Python defaults to "the last item in the list."

In [92]:
salsa_ingredients[:3]

['tomato', 'onion', 'jalapeño']

Note, too, that this slice is giving us items 0, 1 and 2. The `3` in our slice is the first item we _don't_ want. That can be kind of confusing at first. Let's try a few more:

In [93]:
# everything in the list except the first item
salsa_ingredients[1:]

['onion', 'jalapeño', 'lime', 'cilantro']

In [96]:
# the second, third and fourth items
salsa_ingredients[1:4]

['onion', 'jalapeño', 'lime']

In [98]:
# the last two items
salsa_ingredients[-2:]

['lime', 'cilantro']

To see how many items are in a list, use the `len()` function:

In [115]:
len(salsa_ingredients)

5

To add an item to a list, use the [`append()`](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists) method:

In [99]:
salsa_ingredients

['tomato', 'onion', 'jalapeño', 'lime', 'cilantro']

In [100]:
salsa_ingredients.append('mayonnaise')

In [101]:
salsa_ingredients

['tomato', 'onion', 'jalapeño', 'lime', 'cilantro', 'mayonnaise']

Haha _gross_. To remove an item from a list, use the `pop()` method (if you don't specify the index number of the item you want to pop out, it will default to "the last item").

In [102]:
salsa_ingredients.pop()

'mayonnaise'

In [103]:
salsa_ingredients

['tomato', 'onion', 'jalapeño', 'lime', 'cilantro']

You can use the [`in` and `not in`](https://docs.python.org/3/reference/expressions.html#membership-test-operations) expressions, among others, to test membership in a list (they'll return a boolean):

In [122]:
'lime' in salsa_ingredients

True

In [124]:
'cilantro' not in salsa_ingredients

False

### Dictionaries

A _dictionary_ is a comma-separated list of key/value pairs inside curly brackets: `{}`. Let's make an entire salsa recipe:

In [104]:
salsa = {
    'ingredients': salsa_ingredients,
    'instructions': 'Chop up all the ingredients and cook them for awhile.',
    'oz_made': 12
}

To retrieve a value from a dictionary, you'd refer to the name of its key inside square brackets `[]` immediately after your reference to the dictionary:

In [105]:
salsa['oz_made']

12

In [106]:
salsa['ingredients']

['tomato', 'onion', 'jalapeño', 'lime', 'cilantro']

To add a new key/value pair to a dictionary, assign a new key to the dictionary inside square brackets and set the value of that key with `=`:

In [107]:
salsa['tastes_great'] = True

In [108]:
salsa

{'ingredients': ['tomato', 'onion', 'jalapeño', 'lime', 'cilantro'],
 'instructions': 'Chop up all the ingredients and cook them for awhile.',
 'oz_made': 12,
 'tastes_great': True}

To delete a key/value pair out of a dictionary, use the `del` command and reference the key:

In [109]:
del salsa['tastes_great']

In [110]:
salsa

{'ingredients': ['tomato', 'onion', 'jalapeño', 'lime', 'cilantro'],
 'instructions': 'Chop up all the ingredients and cook them for awhile.',
 'oz_made': 12}

### Indentation

Whitespace matters in Python. Sometimes you'll need to indent bits of code to make things work. This can be confusing! `IndentationError`s are common even for experienced programmers. (FWIW, Jupyter will try to be helpful and insert the correct amount of "significant whitespace" for you.)

You can use tabs or spaces, just don't mix them. [The Python style guide](https://www.python.org/dev/peps/pep-0008/) recommends indenting your code in groups of four spaces, so that's what we'll use.

### `for` loops

You would use a `for` loop to iterate over a collection of things. The statement begins with the keyword `for` (lowercase), then a temporary `variable_name` of your choice to represent the items in the thing you're looping over, then the Python keyword `in`, then the collection you're looping over (or its variable name), then a colon, then the indented block of code with instructions about what to do with each item in the collection.

Let's say we have a list of numbers, `ls`.

In [13]:
ls = [1, 2, 3, 4, 5, 6]

We could loop over the list and print out each number:

In [14]:
for number in ls:
    print(number)

1
2
3
4
5
6


We could print out each number _times 6_:

In [15]:
for number in ls:
    print(number*6)

6
12
18
24
30
36


... whatever you need to do in you loop. Note that the variable name `number` in our loop is totally arbitrary. This also would work:

In [111]:
for banana in ls:
    print(banana)

1
2
3
4
5
6


It can be hard, at first, to figure out what's a "Python word" and what's a variable name that you get to define. This comes with practice.

Strings are iterable, too. Let's loop over the letters in a sentence:

In [16]:
sentence = 'We are in Pittsburgh!'

for letter in sentence:
    print(letter)

W
e
 
a
r
e
 
i
n
 
P
i
t
t
s
b
u
r
g
h
!


To this point: Because strings are iterable, like lists, you can use the same kinds of methods:

In [118]:
# get the first five characters
sentence[:5]

'We ar'

In [119]:
# get the length of the sentence
len(sentence)

21

In [125]:
'Pitt' in sentence

True

You can iterate over dictionaries, too -- just remember that dictionaries _don't keep track of the order that items were added to it_.

When you're looping over a dictionary, the variable name in your `for` loop will refer to the keys. Let's loop over our `salsa` dictionary from up above to see what I mean.

In [113]:
for key in salsa:
    print(key)

ingredients
instructions
oz_made


To get the _value_ of a dictionary item in a for loop, you'd need to use the key to retrieve it from the dictionary:

In [114]:
for key in salsa:
    print(key, '=>', salsa[key])

ingredients => ['tomato', 'onion', 'jalapeño', 'lime', 'cilantro']
instructions => Chop up all the ingredients and cook them for awhile.
oz_made => 12


### `if` statements
Just like in Excel, you can use the "if" keyword to handle conditional logic.

These statements begin with the keyword `if` (lowercase), then the condition to evaluate, then a colon, then a new line with a block of indented code to execute if the condition resolves to `True`.

In [None]:
if 4 < 6:
    print('4 is less than 6')

You can also add an `else` statement (and a colon) with an indented block of code you want to run if the condition resolves to `False`.

In [None]:
if 4 > 6:
    print('4 is greater than 6?!')
else:
    print('4 is not greater than 6.')

If you need to, you can add multiple conditions with `elif`.

In [None]:
HOME_SCORE = 6
AWAY_SCORE = 8

if HOME_SCORE > AWAY_SCORE:
    print('we won!')
elif HOME_SCORE == AWAY_SCORE:
    print('we tied!')
else:
    print('we lost!')

### Dealing with errors

Run the code in the following cell:

In [121]:
print(salsa_ingredients[0])
print(salsa_ingredients[-1])
print(salsa_ingredients[100])

tomato
cilantro


IndexError: list index out of range

Hooray! Our first error (maybe). Errors are extremely common, happen to literally every person who writes code and is not evidence that you are dumb or that this kind of work isn't for you or whatever other terrible thing you tell yourself when errors pop up.

They can be frustrating, though! There is a strategy for solving them, though. Let's see if we can figure this one out.

First thing: Read error messages (called "tracebacks") from the bottom up. We're getting something called an `IndexError`, and it's saying "list index out of range."

Moving upward: The error message points to the offending line of code: 3.

Maybe, from here, we can figure out the error. (Answer: We don't have 100 items in our list.) If not, I would Google the exact text of the error on the first line we read, and maybe the word "python": ["IndexError: list index out of range" python](https://www.google.com/search?q=%22IndexError%3A+list+index+out+of+range%22+python). You'll get _very_ acquainted with StackOverflow.

👉 For more information on debugging errors, [check out this notebook](Debugging%20strategies.ipynb).