# Week 2: Data Feminism, Distant Reading, Type Token Ratios... and Python Basics

## Part 1: Data Feminism and Distant Reading

In the first part of the lecture, AH will lead a discussion of this week's readings:
* Kathryn Schulz, “What is Distant Reading?”, *New York Times*
* Catherine D'Ignazio and Lauren Klein, “Why Data Science Needs Feminism,” Data FeminismLinks to an external site.
* Franco Moretti, “Conjectures on World Literature”


## Part 2: Type/Token Ratios

The above discussion will lead into AH's introduction of our first computational metric for literary analysis: the Type/Token Ratio or TTR.

## Part 3: Python Basics: Variables, the Three Basic Data Types, Operators, and the Concept of a Function

This part of the lecture draws on Melanie Walsh's sections on [variables](https://melaniewalsh.github.io/Intro-Cultural-Analytics/02-Python/04-Variables.html) and [data types](https://melaniewalsh.github.io/Intro-Cultural-Analytics/02-Python/05-Data-Types.html). Definitely check these out if you're unclear on anything from this section of the lecture — Walsh's textbook is an amazing resource!


## Week 2 Lab

Don't forget that you have another lab due this week,  Wednesday @ 10pm for discussion during your Thursday tutorial.


## Week 2 Homework

This week, you have your first Homework assignment. It will be released after tutorial. Homeworks are autograded. DL will lead us through a quick tutorial about how to submit homework for autograding. Remember: homeworks are due by 10pm on the Monday following the lecture on which they are based. And you have unlimited attempts for submission!

---


# 3 Python Fundamentals

Let's start getting a handle of some fundamental aspects of the Python programming language. And then let's have a super fun time with a role-playing game involving envelopes and pieces of paper!

## 3a. Variables

Think of variables as envelopes or boxes that you can put stuff in. For instance, numbers, words, sentences, paragraphs... novels!

Variables have two main aspects: **names** and **values**. The name allows Python to know where to look for the cool contents of the envelope. The "value" of the variable are those cool contents.

If you've been immersing yourself in literary theory lately... think of "names" and "values" of variables as corresponding to the "signifier" and "signified" in the Saussurean sign:

![The Saussurean sign](saussure.jpg)

To *put the cool contents into the envelope* (as it were), you need to **assign** a **value** to the **variable**. 

Python syntax looks like the following — `variable_name = value` — and works as follows:
* On the left, you write out the **variable name** you want to use. These names can contain uppercase and lowercase letters A–Z, numbers 0-9, and underscores (`_`). You *cannot* use other punctuation (`.-!?` etc.), or spaces. You can use *pretty much* any name you want, though a few names are reserved for Python's own use, such as `print` and `return` (these special words will appear in green text when you enter them into a Jupyter code cell).
* Then you put a *single equals sign*, `=` (don't use a double equals sign, `==`, which we'll talk about in a second). 
* Then write out the **value** that you want to **assign** to the variable (aka, the stuff you want to put into the envelope). If it's text you want to put into the variable, surround it in quotation marks (`"` — or `'` if you're British, either works!). If it's a number, just write the number.

You may at some point hear that it's "cool" or "efficient" to use really short variable names like `f` or `n` — or you might see some " *cool* " coder using super short names, or catch wind of a rumour that all the " *tough* " programmers use short variable names. Nonsense!!! We know lots of words, so let's not be afraid to use them! A descriptive variable name — one that actually makes clear what kind of stuff is inside the envelope — makes your code more readable (to yourself and others) and reinforces your sense of what's actually going on in your code.

Note: if you're going to use more than one word for your variable name, it is conventional and reader-friendly to use an underscore `_` to separate these words, rather than just mashing them together. This is called (get ready for it) **snake_case**!

Below, I'll put my name into a variable called... `name`!

In [None]:
name = "Adam"

Okay — although there's no output there, the variable has now been created, and assigned that value! 

How can we check that there is now a variable called `name` that contains my name, `"Adam"`? 

Well, we can just type `name` into a code cell. Python will interpret that command, realize that it makes sense, go find the "envelope" called `name` and spit out its contents. (In Saussurean terms, it will understand the *signifier* `name` and then spit out its *signified*, `"Adam"`.)

In [None]:
name

Let's try the same thing with a number rather than text...

In [None]:
age = 41

In [None]:
age

Note that you can perform mathematical operations on the contents of this variable just by "calling" it by name. (In class, we'll discuss how this would differ if you inputted `"age" * 2` instead.)

In [None]:
age * 2

You can use decimals, too...

In [None]:
batting_average = 0.190

In [None]:
batting_average

In [None]:
batting_average * 1000

In [None]:
batting_average_2 = 190

In [None]:
batting_average_2

As you'll perhaps have noticed above, Python treats decimal numbers and whole numbers differently — it considers them different **data types**. We'll get to that in a second. 

But first, let's note another kind of thing we can stick into our variable envelopes (there are even more of these, which we'll be meeting in future weeks!). 

In [None]:
l0ves_T0r0nt0 = "True"
loves_Toronto = True

In [None]:
l0ves_T0r0nt0

In [None]:
loves_Toronto

## 3b The Concept of a Function

So far we've been checking the contents of our variable envelopes by just typing their names into a code cell. 

Another way of doing this is to use a **function** called `print()`. Functions are little bundles of code. Think of them as like **verbs** — they perform actions. But they're *transitive* verbs, meaning they need a "noun" to perform their actions on.

The syntax for a function is `verb(noun)` — no spaces allowed.

The Python terminology for these things is `function(argument)`.

You can, for example, provide some actual words (bundled in quotation marks so that Python knows what you're talking about) for the `print()` function to spit out. Below is the classic "first thing you ever type into a programming language" command — the famous `print("Hello world")`

In [None]:
print("Hello world")

We can also provide a variable in the "noun" or *argument* position (no quotation marks, so Python knows that we're talking about a variable), and the `print()` function will **print out** the contents of that variable.

As always, the syntax is `verb(noun)` — still no spaces allowed, and getting any part of that syntax wrong will cause the Python interpreter will return an error message:
* name of the function misspelt, even slightly
* name of variable misspelt, even slightly
* space between function name and variable name
* missing parenthesis, or parenthesis pointing in wrong direction
* curly or square bracket used rather than parentheses. 

Think of the Python interpreter as an extraordinarily dutiful but uncompromisingly literal reader (sort of like the way Watson thinks of Holmes, even if Watson is wrong about that).

In [None]:
prnit(name)

In [None]:
print(ag3)

In [None]:
print batting_average

In [None]:
print(loves_Toronto

In [None]:
print(loves_Toronto)

Below I've spelled everything correctly, and I've conveinently put all five `print()` statements into a single cell.

What differences, if any, do you notice between the output we get when we just type the name of variable into a code cell and run it, versus when we use `print()`

In [None]:
print(name)
print(age)
print(batting_average)
print(l0ves_T0r0nt0)
print(loves_Toronto)

By the way, look what just happens when we type all five variable names into their own lines in a code cell...

In [None]:
name
age
batting_average
loves_Toronto
l0ves_T0r0nt0

## 3c The Four Basic Data Types

Without perhaps realizing it, you have now met the four basic *data types* that we'll be using in Python. They are:

* **Strings** `str`: These are what we would call "words." They consist of any number of letters, numbers, or symbols *strung together* (get it?). We can do lots of fun things with strings, but we can't treat them like numbers and perform mathematical operations on them, except where "polymorphism" applies (see last lecture).
* **Integers** `int`: These are what I would call whole numbers. They're numbers without decimal points. They can be positive or negative. We can treat them like numbers (which they are!) and perform mathematical operations on them.
* **Floating Point Numbers** `float`: These are numbers, but with decimal points! Although I/AH personally found this piece of terminology unfmailiar when I first started programming, I came come to appreciate it as a cool phrase — and I am not alone in this, clearly, because Pharaoh Sanders has a great album called *Floating Points*.
![Floating Points by Pharaoh Sanders](floatingpoints.jpg)

* **Booleans** `bool`: These can have only two values: `True` or `False`. They are, in other words, **binaries**. As humanities people we are naturally suspicious of binaries, for good reason. But since we're humanities data scientists, we still need to use Booleans and understand how they work!

Let's use another handy Python **function**, `type()`, to identify the **types** of all the variables we've created thus far.

In [None]:
type(name)

In [None]:
print(name)
type(name)

In [None]:
print(age)
type(age)

In [None]:
print(batting_average)
type(batting_average)

In [None]:
print(loves_Toronto)
type(loves_Toronto)

In [None]:
print(l0ves_T0r0nt0)
type(l0ves_T0r0nt0)

## 3d Operators — Including the Mysterious `==`

"Operator" is another word I love. I especially love it as an insult for someone who seems always to be angling to gain some advantage: "He is such an *operator*!" 

But we're not talking about those kinds of operators. Instead, we're talking about mathematical or logical operators. We already met a bunch last time, without having the fun vocabulary term "operator". These were:
* `+`: addition
* `-`: subtraction
* `*`: multiplication
* `/`: division

These all work on variables just as they would on directly-inputted numbers or text...

In [None]:
students_per_tutorial = 20
number_of_tutorials = 8

In [None]:
students_per_tutorial * number_of_tutorials

In [None]:
max_students = students_per_tutorial * number_of_tutorials

In [None]:
print(max_students)

In [None]:
name * 40

Now let's meet a few more operators:
* `==`: equal to
* `!=`: not equal to
* `>`: greater than
* `>=`: greather than or equal to
* `<`: less than
* `<=`: less than or equal too

Obviously, the stilted phrasing of the above definitions ("equal *to... something*") suggests than this second set of operators is used to **compare** two things. In the following examples, we'll be using them to ask the Python interpreter some questions, have the Python interpreter **evaluate** those questions, and have it give us back answers of `True` or `False`.

(Can anyone name the **data type** in which Python will be answering our questions??)

In [None]:
age != 100

In [None]:
age > 30

In [None]:
age < 30

In [None]:
age < 50

In [None]:
age > 41

In [None]:
age >= 41

In [None]:
batting_average > .300

In [None]:
batting_average >= .190

In [None]:
age == 41

Now that statement up there ^^^^ — `age == 41` — is a **major possible source of confusion** for people new to the Python programming language (i.e., all of us). 

The confusion comes from the fact that we all think of this symbol, `=`, as an "equals sign". **But that isn't quite what it means in Python.**

For instance, how is the statement above different from the statement below?

In [None]:
age = 41

To my way of thinking, the real "equals sign" in Python is `==`. This thing, `=`, is more like the "put the things to the right of me into a variable named whatever is to the left of me" sign. 

However you personally think about it, **it's really important that you become comfortable with the difference between `==` and `=` in Python**. It will take a while to internalize it, and you **will** make this mistake in your coding at some point... but once you get ahold of it, it will be yours forever!

In [None]:
"Adam" == "Adam"

In [None]:
Adam = "Adam"

In [None]:
Adam == "Adam"

In [None]:
print(loves_Toronto)
loves_Toronto == True

In [None]:
type(loves_Toronto)

In [None]:
print(l0ves_T0r0nt0)
l0ves_T0r0nt0 == True

In [None]:
type(l0ves_T0r0nt0)

## 3e Printing "f-Strings"

Below are some cells containing `print` statements for a special kind of **string** called an **f-String**. See if you can figure out the syntax of this cool and handy variation of a normal `print(string)` statement. Play around with it and see if you can break it... then fix it again!

In [None]:
cat1 = "Rosie"
cat2 = "Jazz"
print(f"I have two cats: {cat2} and {cat1}.")

In [None]:
print(f"My name is {name} and I am {age} years old. My batting average in softball is {batting_average}. It is {loves_Toronto} that I love the city of Toronto.")

## 3f One More Important Python Things Before We Finish: Comments (`#` and `"""`)


Although we mostly use **Markdown cells** to include written instructions or comments in Jupyter Notebooks, sometimes you'll want to add a comment to a code cell — to help someone interpret your code, to help yourself remember what you were doing, or for any number of any reasons! 

You'll find some comments in the code cells in your Lab and Homework files this week. Here's how they work.

On an individual line of code, inserting the hash `#` symbol tells Python to ignore anything that comes after that point in that line of code.

In [None]:
# In a code cell, Python will ignore this whole line of code, since there's nothing before the "#"

In [None]:
cat_sound = "meow" # Whereas in this line, it will treat everything to the left of the # as code, and ignore everything to the right 
print(cat_sound)

In [None]:
print(cat_sound) # here, I put the # before the command, so Python ignores that instruction.

You can also comment out multi-line segments of code by using `"""` commands. For these, write `"""` on a new line of code, then write your multi-line comment, then write another `"""` on a new line of code.

In [None]:
"""
Here is an awesome bit of Python coding in which I record the sounds that the following animals make:
-- cats
-- dogs
-- lizards
Then I reveal these sounds to the unsuspecting public.

Python knows to ignore these opening lines of commented-out code because they come between triple-quotation-marks
"""
cat_sound = "meow"
dog_sound = "bark"
lizard_sound = "sizzle sizzle" # I bet you didn't see that one coming!

print(f"After many years of research, I am finally ready to share my findings regarding animal sounds: cats go '{cat_sound}', dogs go '{dog_sound}', and lizards — somewhat surprisingly! — go '{lizard_sound}.'")

## 3g ROLE PLAYING!

Hopefully we've left lots and lots of time for the fun role-playing game that I/AH spent so much time designing...

In [None]:
# We've left this blank code cell for you to experiment with after lecture.
# Please feel free to try things out here, and in tutorial ask any questions that come up!

