# Lesson 2 - Understanding Variables

**Overview**

In this lesson you will learn the basic types of variables in Python and most other programming languages. You will also learn how to convert between variable types in order to deal with one of the most common errors for beginning programmers.


## Computation and Variables

Let's start with something simple: making the computer do math for us!

**Think of Python like a very smart calculator** - you can tell it to add numbers, and it will give you the answer.

**Tip**: Remember you can run the cell below in two ways:

✅ **Option 1:** Click the ▶️ play button at the top left of the cell  
✅ **Option 2:** Press `Ctrl+Enter` (PC) or `Shift+Enter` (Mac)
  
The answer will appear below the cell.

In [15]:
2+1

3

## What Are Variables?

Adding specific numbers like `2 + 1` is fine, but the real magic happens when we use **variables**.

**Think of a variable like a labeled box** where you can store information. 

In the example below:
- We put the number `2` in a box labeled `x`  
- Then we can use `x` instead of writing `2` again
- `x + 1` gives us the same result as `2 + 1`

**Why is this useful?** You'll see in a moment! 🎯

In [5]:
x = 2
x + 1

3

## How Variables Work

The `=` sign in Python means **"store this value"** - it's like putting something in a labeled box.

```python
x = 2
```

**Read this as:** "Store the number 2 in a box labeled x"

⚠️ **Important:** In Python, `=` does NOT mean "equals" like in math class. It means "store" or "assign."

### Why Variables

It may not be entirely obvious why you would want to store things in a variable. If you want to add `2` + `1`, storing `2` in the variable `x` seems like an extra step. The reason for doing so is that you do not always know what the information is going to be. 

**Example: Calculating Area of a Rectangle**

Imagine you need to calculate the area of different rectangles. Without variables, you'd have to write:
- Room 1: `12 * 8 = 96`  
- Room 2: `15 * 10 = 150`
- Room 3: `9 * 12 = 108`

But with variables, you can write the formula once and reuse it:

```python
length = 12
width = 8
area = length * width
```

Now you can easily change the `length` and `width` values to calculate any rectangle's area. This becomes incredibly powerful when you're processing hundreds of rooms in a building or thousands of data points!

### Types of Variables

Python can store different **types** of information in variables. Here are the four most important types you need to know:

1. **Integer** - Whole numbers (like 1, 42, -5)
2. **Float** - Numbers with decimals (like 3.14, 2.0, -1.5)  
3. **String** - Text (like "Hello", "James Madison University")
4. **Boolean** - True or False

Python automatically figures out what type of data you're storing. This makes Python easier to use than many other programming languages!

In [6]:
# Integer no decimal
a = 1

#float decimal
b = 2.0

#String quote marks
c = "Three"

#Bool reserved word True or False
d = True

**Want to check what type a variable is?** Use the `type()` function:

In [7]:
type(a)

int

In [8]:
type(b)

float

In [9]:
type(c)

str

In [10]:
type(d)

bool

## Common Errors 🚨
The most common mistake beginners make is a `TypeError`. This happens when you try to mix incompatible data types.

✅ **This works:** I can add `a` and `b` because they are both numbers.

In [None]:
a + b

3.0

**Tip:** If you get the error below it is because a variable is not found. This is probably because you did not run the lines of code above. You can fix this by going back to the individual cells and executing them or clicking the `Execute Above Cells`. 

![Execute Above Cells](images/execute_above_cells.png)

This will run all the code upstream.

**1. NameError - Variable not defined:**
```python
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[1], line 1
----> 1 a + b

NameError: name 'a' is not defined
```

### TypeError

❌ **This won't work:** If I try to add `a` + `c` it will fail because they are different types. 

**Think of it like this:** 1 + "Three" doesn't make sense to a computer. It's like asking "What is 1 + Blue?"

### Task 1:
Create a new cell and add `a` + `c` and see what happens.

**Creating a new cell**

Click on the **+Code** button at the top of the toolbar.
![add code block](images/add_code_block.png)



In [None]:
#Create Code Here


TypeError: unsupported operand type(s) for +: 'int' and 'str'

You should see an error that looks like this:

```python
TypeError: unsupported operand type(s) for +: 'int' and 'str'
```

**What does this mean?** 
- Python is saying: "I can't add an integer (whole number) and a string (text)"
- This isn't a silly mistake! When working with real data, numbers sometimes get saved as text by accident
- Always check your data types when you get strange errors

**Why this matters:** Imagine you have a spreadsheet where some numbers are saved as text. Your calculations won't work until you fix the data types!

**Here's a tricky example:**

Run the cell below and see what happens:

In [None]:
text1 = "Maroon"
text2 = "5"
text1 + text2

'Maroon5'

**Surprise!** This actually works! ✅ 

When you "add" two strings together, you get **string concatenation** (fancy word for "joining text together"). This does not work the other way though. Run the line below:

In [None]:
text1 - text2

TypeError: unsupported operand type(s) for -: 'str' and 'str'

We get the following error: 

```python
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[3], line 1
----> 1 text1 - text2

TypeError: unsupported operand type(s) for -: 'str' and 'str'
```

**What does this error mean in English?**
- Python is telling us that the `-` (subtraction) operator doesn't work between two strings (`'str'` and `'str'`)
- Unlike `+` which concatenates strings, there's no built-in way to "subtract" text

**How to get help:**
- Read the error message carefully - it tells you exactly what went wrong
- The error type (`TypeError`) indicates a data type problem
- Look at what types are involved (`'str' and 'str'`) and what operation failed (`-`)

### Numbers vs "Numbers"
Sometimes numbers look like numbers, but they're actually text! Can you spot the problem below?

In [None]:
number1 = 567
number2 = "5309"
number1 + number2

TypeError: unsupported operand type(s) for +: 'int' and 'str'

**Problem Alert!** ❌

This gives us a TypeError because:
- `567` is a number (integer)  
- `"5309"` is text (string) - notice the quotes!

Even though "5309" looks like a number, the quotes make Python treat it as text. You can't do math with text!

### Type Conversion 🔄

**Good news!** You can convert between data types using special functions:

- `int()` → Convert to whole number
- `float()` → Convert to decimal number  
- `str()` → Convert to text
- `bool()` → Convert to True/False

**How to use:** Put your variable inside the parentheses: `int("5309")`

**Now our math works!** ✅ By converting the string to an integer:

In [10]:
number1 = 567
number2 = "Five Three Zero Nine"
number3 = "8788,87"
float(number3)
#int(number2)


ValueError: could not convert string to float: '8788,87'

### 🧩 Practice Time! 
**Challenge:** Can you find another way to combine `number1` and `number2` using type conversion? 

*Hint: What if you convert `number1` to a string instead?*

In [4]:
#Enter Code Here
number1 = 567
number2 = "5309"
str(number1) + number2

'5675309'

## Arrays and Lists

One of the most important complex data types in data science is an array, which in Python is usually referred to as a **List**. In Python, there are actually things called **arrays** but these are used specifically when the data is all of the same type. To avoid confusion, this is the last time we'll use the word array and simply refer to lists.

A list is a collection of data that has been placed in a particular order through an index. For example, a list of names:

```Python
list1 = ["JMU", "UVA", "Virginia Tech", "William and Mary"]
```





### Accessing List Elements

You can access individual values in a list by using it's index. The index always starts at `0`, so the last index value for this list is `3` even though there are four items. We access this value by giving the name of the list and the index in brackets.

In [None]:
list1 = ["JMU", "UVA", "Virginia Tech", "William and Mary"]

What value will appear if I type `list1[2]`?

In [None]:
list1[2]

'Virginia Tech'

You can also traverse lists from right to left by putting a `-` in front of the index. This means that `-1` is the last term in the list.

What will the following code produce?

In [None]:
list2 = ["JMU", "UVA", "Virginia Tech", "William and Mary"]
list3 = [19934, 24000, 31000, 8500]
list4 = [18904, 22000, 30000, 8000, 8095]
list3 = list3[len(list3)-1]


### Slicing Lists

Sometimes you may want to get several values from a list. This is called "slicing". The syntax for this is as follows: `list[initial_value : end_value]`, where `initial_value` is the part of the list where you want to start, and `end_value` is where you want to end but not include.
For example, `list1[0:2]` will give the first two items because it goes from list index 0 (JMU) to list index 1 (UVA) and then stops at list index 2.

In [None]:
list1[0:2]

['JMU', 'UVA']

In [None]:
#We can get the last three values by slicing from position 1 to position 4.
list1[1:4]


['UVA', 'Virginia Tech', 'William and Mary']

If you leave the initial value empty it will automatically start from 0. If you leave the end value empty, it will automatically go to the very end. We can get the first three and the last three values in this way.

In [None]:
list1[:3]

['JMU', 'UVA', 'Virginia Tech']

In [None]:
list1[1:]

['UVA', 'Virginia Tech', 'William and Mary']

### List Len()

One common operation is to find the length of a list. If you're new to programming this does not seem very useful because you can just look at the list and count, but this has all sorts of use cases. You can use the `len()` function to find the length of the list. 

What is the answer to the code below?

In [None]:
len(list1)

4

#### Task 3: Adding List Lengths

Now imagine we have another list of universities.

If we wanted to find the total length of list1 and list2 combined how would we go about it?

In [None]:
#write your answer below and run it.
list1 = ["JMU", "UVA", "Virginia Tech", "William and Mary"]
list2 = ["Harvard", "Yale", "Boston College","Princeton", "Duke"]

### Nested Lists

You can also put lists inside lists to create new variables.

In [None]:
list3 = [list1,["Harvard", "Yale", "Boston College","Princeton", "Duke"]]

print(*list3, sep='\n')

['JMU', 'UVA', 'Virginia Tech', 'William and Mary']
['Harvard', 'Yale', 'Boston College', 'Princeton', 'Duke']


*Note*: There are two lists combined in one list.

We can access a nested list value by nesting the brackets. For example, if I want to access the **first** value on the **first** list this would be:

In [None]:
list3[0][0]

'JMU'

If we want to access the **second** value on the **second** list we could do the following:

In [None]:
list3[1][1]

'Yale'

#### Task 4: Access a nested value

What would be the code for picking "Virginia Tech"?

The use case for creating, adding, and deleting lists seems rather limited now, but once you start working within data tables it will become obvious why you would want to do this.

## Iteration

One of the most common features of any programming language is the ability to iterate over data. That is, repeat the same thing over and over again. The most common loop is the trusty `for` loop. 
This consists of a header that sets the number of iterations and a body statement that does something during an iteration. 
In Python `for` loops are pretty easy, the only annoying feature is that it must be tabbed properly in order to work.

In [None]:
#the following loop goes through all the times in the list and converts them to upper case.
list4= ["jmu", "uva", "virginia tech", "william and mary"]


for x in list4:
    print(x.upper())   

list5 = ["jmu","JMU","uva","UVA","virginia tech","Virginia Tech","william and mary","William and Mary"]

JMU
UVA
VIRGINIA TECH
WILLIAM AND MARY


#### Task 5: Convert to Lower Case

The method for converting a string to lower case is `.lower()`. How would I write a `for` loop to convert all list values in **list5** to lower case?

In [None]:
# Test your answer here
list5 = ["JMU", "UVA", "VIRGINIA TECH", "WILLIAM AND MARY"]

## Functions

Functions are a way to store a procedure for later use. If variables store values, functions store things you do to variables. When doing corpus linguistics you'll mostly be using functions other people have already created, but it is useful to know how functions work to understand what you are actually doing with your data. 
Functions consist of two parts: 
1. **Definition** - Indicates what the function does
2. **Execution** - When the function is actually used 

### Definition

A function is defined through the `def` statement, function name, any parameters, and a function body.

```python
def HelloWorld():
    print("Hello World")
```

The above function just prints the texts "Hello World". We can call it by using the function name: `HelloWorld()`

In [None]:
def HelloWorld():
    print("Hello World")

In [None]:
HelloWorld()

Hello World


This is not a very useful function. We can also give it parameters and have it do something with the parameters.

```python
def HelloName(name):
    print("Hello " + name)
```

We have created the parameter `name`. Now when we execute the function we can pass in a name as the **argument** and it will print out "Hello + name". I.e. `HelloName("Kelly")` results in `Hello Kelly`.

We can also `return` a value.

```python
def AddHello(name):
    newname = "Hello " + name
    return newname
```

This returns the string newname which adds "Hello " in front of every string passed into the function.

So far not very useful.

In [None]:
def AddHello(name):
    newname = "Hello " + name
    return newname

In [None]:
HelloGeorge = AddHello("George")
print(HelloGeorge)

Hello George


### Execution

Functions become useful when they take on tasks that would require a lot of repeated code.<br> For example, if you have been given a list of names, but not everyone entered their name consistently you can write a function to clean up the list. This could involve capitalizing every name, removing any special character, and sorting the list alphabetically.

In [None]:
def cleanList(listnames):
    #create a new empty list that will store the revised values
    newlist = []
    #loop through every name in the provided list
    for name in listnames:
        #create a temporary variable for name
        tempname = name
        #change name to title case
        tempname = tempname.title()
        #edit out any special characters in tempname
        tempname = ''.join(char for char in tempname if char.isalnum() or char.isspace())
        #add modified name to newlist
        newlist.append(tempname)
        #return the list sorted
    return sorted(newlist)
    

In [None]:
list6 = ["Ziggy%","Bob", "Rich$", "sam", "HANK!!!", "T-Dubbs"]

In [None]:
list7 = cleanList(list6)

In [None]:
list7

['Bob', 'Hank', 'Rich', 'Sam', 'TDubbs', 'Ziggy']

#### Task 7

Imagine that you have a program that uses a list of phone numbers provided by users. Unfortunately, not every user was equally diligent in how they entered their phone number. The code required to convert a 10 digit string to a phone number format is as follows:<br>

```python
phonenumber = ''.join(number for number in phonenumber if number.isdigit())
phonenumber = "({}) {}-{}".format(phonenumber[:3], phonenumber[3:6], phonenumber[6:])
```

**Note** The above code incorporates things we have already learned. A `for` loop and list slicing.

Create a function called `cleanPhonenumber` that takes a list of unformatted phone numbers and goes through each phone number to standardize the format using the functions above and then return the new list of phone numbers.

In [19]:
dirty_phonenumbers = ["5678761990","(413)467-8900","212 340 5678", "817-999-7788"]

In [20]:
#Modify the function below to return a list of cleaned phone numbers in the format. Several key elements are missing.
def cleanPhonenumber(phonenumbers):
    newlist = []
    for phonenumber in phonenumbers:
        phonenumber = ''.join(number for number in phonenumber if number.isdigit())
        phonenumber = "({}) {}-{}".format(phonenumber[:3], phonenumber[3:6], phonenumber[6:])
        newlist.append(phonenumber)
    return newlist

In [21]:
#call your function here
cleaned_phonenumbers = cleanPhonenumber(dirty_phonenumbers)
cleaned_phonenumbers

['(567) 876-1990', '(413) 467-8900', '(212) 340-5678', '(817) 999-7788']

### Methods

A **method** is a special type of function that "belongs to" a specific data type. Think of methods as built-in tools that come attached to different types of data.

**The key difference:**
- **Functions** stand alone: `len(my_list)` or `type(my_variable)`  
- **Methods** are attached to data: `my_string.lower()` or `my_list.append()`

**Why the dot notation?**
Methods use a dot (`.`) because they're asking the data itself to do something. When you write `name.lower()`, you're essentially saying "Hey name, please convert yourself to lowercase."

**We've already used methods!**
Remember when we used `.upper()` and `.lower()` on strings? Those were methods! 

```python
university = "james madison university"
university.upper()  # Method: asks the string to become uppercase
```

**Example:** If `.lower()` makes everything lowercase, what do you think `.title()` will do?

### Task 8: Use `.title()` method

Run the `.title()` method below.

In [None]:
name = "JAMES MADISON UNIVERSITY"



'James Madison University'

We will be using methods throughout this project, so it is important to understand that we are running a function when you see the notation `.methodname()`

**Common String Methods You'll Use:**
- `.lower()` - converts to lowercase
- `.upper()` - converts to UPPERCASE  
- `.title()` - Converts To Title Case
- `.strip()` - removes extra spaces from beginning and end
- `.replace("old", "new")` - replaces text

**Why methods matter in digital studies:**
When you're processing large amounts of text (like historical documents, social media posts, or survey responses), methods let you quickly clean and standardize your data. Instead of writing complex code, you can simply ask the text to clean itself!

```python
messy_text = "  COLONIAL WILLIAMSBURG  "
clean_text = messy_text.strip().title()  # Chain methods together!
# Result: "Colonial Williamsburg"
```

## 🎯 Lesson Summary

Congratulations! You've learned the fundamental building blocks of Python programming:

### **Data Types:**
- **Integers** - Whole numbers
- **Floats** - Decimal numbers  
- **Strings** - Text data
- **Booleans** - True/False values
- **Lists** - Ordered collections of data
  
### **Operations:**
- **Type conversion** - Converting between data types
- **List indexing and slicing** - Accessing parts of lists
- **Iteration** - Using `for` loops to repeat operations
- **Functions** - Creating reusable code blocks
- **Methods** - Using implied functions on data

### **Key Skills:**
- Debugging **TypeError** messages
- Creating and manipulating lists
- Writing functions with parameters and return values
- Using loops to process data efficiently


## 🚀 Next Steps

These fundamentals prepare you for:
- **Data analysis** with pandas
- **Text processing** for digital humanities
- **Data visualization**
- **Machine learning basics**

Keep practicing these concepts - they're the foundation for everything else in Python!