<a href="https://colab.research.google.com/github/scskalicky/LING-226-vuw/blob/main/01_Variables_and_Strings.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Get your feet wet and start using some Python.

Python has a reputation among some people for being "easy" because the code almost looks like you can just type what you want. An older [XKCD](https://xkcd.com/) comic tried to make this point (although it was when Python 2.x was being used!). You might not immediately get parts of this comic, but come back later in the course and you might find it more meaningful!


> <img src = "https://imgs.xkcd.com/comics/python.png" height = 500>



Anyhow, let's put Python's reputation to the test and dive right into some basics and start using it.

In this section, there are some code cells which you do not need to modify.

1. Look at each cell and try to understand it without running the code cell. What do you think will happen when you run the code cell?
2. Run the code cell (click the Play icon) to check your prediction. Were you right?



In [None]:
print("Soda Onion")

In [None]:
name = "Soda Onion"
print(name)

In [None]:
len(name)

In [None]:
for letter in name:
  print(letter)

In [None]:
[letter for letter in name]

In [None]:
[letter for letter in reversed(name)]

In [None]:
NZ_Pop = 4868882

World_Pop = 7900000000

NZ_perc = (NZ_Pop / World_Pop) * 100

NZ_perc

In [None]:
anthem_te_reo = "E Ihowa Atua,\nO ngā iwi mātou rā,\nāta whakarongo na;\nMe aroha noa.\nKia hua ko te pai;\nKia tau tō atawhai;\nManaakitia mai\nAotearoa"
anthem_english = "God of nations at thy feet,\nIn the bonds of love we meet.\nHear our voices, we entreat\nGod defend our free land.\nGuard Pacific's triple star\nFrom the shafts of strife and war,\nMake her praises heard afar,\nGod defend New Zealand."
print(anthem_te_reo + "\n\n" + anthem_english)

# Values and Variables

The code cells above contained a range of Python *values* which were saved to different *variables*. Values are the basic pieces of information or data that Python (and other programming languages) work with.

A value can be a number, a piece of text, or other things, such as containers. Here are some of the examples used in the code cells above

Variable | Value | Type
------|------|------
`name`|"Soda Onion"|text
`NZ_Pop`|4868882|number


The *type* of a value is actually more specific than the table above suggests. For example, numbers are classified into integers (whole numbers) or floats (decimals). And text is classified as a `string` (more on this later).

You can check the `type` of a value in Python by using the `type()` function. See below, the `type` for each value will be shown in the output. Knowing what the type of a value is can be useful when performing operations on text versus numbers.

In [None]:
# int = integer
type(42)

In [None]:
# float = decimal
type(4.2)

In [None]:
# str = text
type('forty-two')

## What’s the big deal with variables?
- values are gone once they are executed by the interpreter
- variables will persist in memory (some longer than others)
- variables can be updated based on different conditions

### Assigning values to a variable
To assign a value to a variable, we use the `=` sign and the following syntax:

> `variable_name = value`

### Rules for variable names
- Do not start with numbers
- Do not use special characters (e.g., #, @)
- Do not use Python keywords (e.g., `is`, `True`, `False`)


## **Your Turn**

1. Below, create a code cell and save some text to a variable. Give your variable a descriptive name.

To save text to a variable, use this syntax:

> `variable_name = "text"`


2. Then, in a new code cell, use the `print()` function to print the value of your variable to the console. Use this syntax:

> `print(variable)`

# Strings

As you saw above, we are using the word `string` to represent text. You might wonder, why not just call it `text` or `words`? The reason provides us with our first glimpse into the difference between how humans and computers "see" language. For humans, we see groups of letters and white space form words. However, for computers, strings are defined as a **sequence of characters**. The word `string` thus represents the sequenced nature.

Think of a string like a laundry line - there is a start and an end, and the washed clothing on the line (as well as the spots without clothing) represent the characters in the string. And, there are other types of sequences in Python, so a string is a specific type of sequence, one which contains characters representing written language.

### Creating strings

You've seen above that we used quotes to surround the strings, these quotes are called `delimiters` and are how you determine the start and end of a string. Crucially, the delimiter on each end must match:

  - ' + ' (single quotes)
  - “ + ”  (double quotes)
  - “‘ + ”’ (triple quotes)

What happens when you don't have matching delimiters? Run the code cells below and compare the output.

In [None]:
print('I'm so happy to be here!')

In [None]:
print("I'm so happy to be here")

## **String Functions**

We can use built-in functions to perform basic operations on strings. For example:

> * `string.upper()` = return a lowercase version of a string
> * `string.lower()` = return an uppercase version of a string
> * `string.title()` = return a title case version of a string
> * `string.strip()` = strip all whitespace from a string
> * `string.rstrip()` = strip all trailing whitespace from a string
> * `string.lstrip()` = strip all leading whitespace from a string

*You can find a more complete list of string methods/functions [here](https://www.geeksforgeeks.org/python-string-methods/)*




In [None]:
# Print a lower case version of the string using .lower()
print('KIWI'.lower())

In [None]:
# Remove the whitespace from the front of a sentence using .lstrip()
'    look at how the whitespace in front of these words goes away!'.lstrip()

We can also use the Python function `len()` to count the length of a string. Since a string is a series of characters, `len()` will tell us how many characters are in a string.

In [None]:
# Count the length of a string using len()
len('kiwi')

Take note that `len()` and `print()` required putting the variable or value inside the brackets, whereas the other functions require putting the variable or string first and typing the `.function()` afterwards.

The first kind of function, such as `print()` and `len()` are functions not specific to any one type or class. The second kind, such as `.upper()` are specific to a type (strings), and thus that function can only be used on values of that type. You can think of the `.function()` style as a tail which belongs to a specific animal. Foxes can only use fox tails, and dogs can only use dog tails. Strings can only use string functions, and other types can only use their functions.

Later on, this distinction becomes fuzzy when we start importing functions from different libraries, because we can use the `.function()` method to call functions from libraries, or import them directly. But the general principle still remains - functions which are specific to *something* (either a type, library, or module) can be called using the `.` notation. You can see this below, where I can import a library (called `math`) and then use the `.fsum()` function to calculate the sum of two numbers.

In the second cell, I import *just* the function, which makes it available without needing to use the `.` notation. This may be confusing right now, but you'll notice how different strategies for importing influence the way that functions are available in a notebook or a program.


In [None]:
# Import the library and then call a specific function from that library
import math
math.fsum([1,2])

In [None]:
# Import just the function from the math library
from math import fsum
fsum([1,2])

## **Your Turn**

- Create some new code cells below
- In the code cells, create some strings and try:
  - using different delimiters
  - using different functions on the strings
  - saving the strings to a variable name and then using functions






## **Unpacking Strings**

To continue the point I was making above, strings might look like sentences or phrases to us, and we can understand the differences between letters, punctuation, and whitespace. But computationally these are all variably equal **characters**, and as such a string in Python will be as long as the number of characters.

For example, note how the length of the following two strings is different, even though the phrase is identical. Can you see why?

In [None]:
# Version 1
len('underneath it all')

In [None]:
# Version 2
len('underneath it all ')

Punctuation also matters, compare these two examples, why is the first version longer than the second?

In [None]:
len("I can't even")

In [None]:
len("I cant even")

# Discussion

Consider the way strings are stored computationally.

- What implications or challenges do you think this might pose for the computational representation of text?

- Based on your understanding of strings thus far, are there any differences between the following computational representations of text?
  - words versus letters
  - sentences versus words
  - paragraphs versus sentences
  - texts versus paragraphs

- What remaining questions do you have about variables, strings, and the functions used in this notebook?

