# Methods I: Programming and Data Analysis

## Session 02: Data Types, Statements, Comments

### Gerhard Jäger

#### (based on Johannes Dellert's slides)

November 2, 2021

### Structure of a Python Program

Basic knowledge for building your first program:

- Python modules are text files ending in `.py`
- important: all commands must be indented identically!
- script can be started via "Run" in Spyder

### Printing to Standard Output

The `print()` function:

-   is called with a text in quotation marks:

In [1]:
print("Welcome to the world of programming!") 

Welcome to the world of programming!


-   will print the text to **standard output**

-   by default, standard output will be visible on the console

### First Program

In [3]:
print("Welcome to the world of programming!")
print("Willkommen in der Welt der Programmierung!")
print("Bienvenue dans le monde de la programmation!")
print("Tervetuloa ohjelmoinnin maailmaan!")
print("Добро пожаловать в мир программирования!")
print("欢迎来到编程的世界!")

Welcome to the world of programming!
Willkommen in der Welt der Programmierung!
Bienvenue dans le monde de la programmation!
Tervetuloa ohjelmoinnin maailmaan!
Добро пожаловать в мир программирования!
欢迎来到编程的世界!


# Variables and Assignment

## String Literals

-   one way to create simple objects is to use **literals**
-   already used: **string literals** (pieces of text in quotes)
-   **string**: a sequence of characters (*Zeichenkette*)
-   rules for string literals:
    -   must be surrounded by single (`’string’`) or double quotes (`"string"`)
    -   only the other type of quotes is allowed inside a string:
        -   `’I am a "string"’` is a string
        -   `"I am a ’string’"` is also a string
        -   `"I am a "string""` is not a valid string
    -   the backslash has special significance as an escape character, `"some\time\ago"` will look different from what you think
    -   to produce an actual backslash, you need to have it twice:
        `"some\\time\\ago"`

In [11]:
print("some\time\ago")

some	imego


In [12]:
print("some\\time\\ago")

some\time\ago


## Variables

Role of **variables** in Python:

-   informally: names you can give to objects
-   a variable is always bound to its **value**, a single object
-   the value of a variable can **change at runtime**
-   code containing variables will act on different objects depending on
    the variables' values; this allows a program to perform the same
    action on a sequence of objects (automation!)

Restrictions for **identifiers** (variable names):

-   can only consist of letters, digits, and underscores
-   needs to **start with a lowercase letter**
-   must not be on a list of 32 keywords\
    (e.g. `print` is not a valid variable name)
-   can start with underscore(s), but this has special semantics

The **assignment operation**

-   binds a variable to an object, assigning it as the variable's value
-   is written with a **single equal sign**:

In [13]:
my_variable = "some String literal"


causes the variable to evaluate to the object:

In [14]:
print(my_variable)
my_variable = 1
print(my_variable)


some String literal
1


-   is NOT the same as comparison (= in mathematics)
-   if a different object was bound to the variable, and no other
    variable refers to the object, it will be lost!
-   this mechanism is used to discard unnecessary objects

# The String Datatype

Important facts about string objects in Python 3:

-   they are a basic data type, i.e. a lot of behavior is predefined
-   by default, they are in Unicode (support all languages)
-   can be manipulated by a number of methods,\
    all of which return new string objects (strings are **immutable**)
-   internally, strings are sequences of characters

### String Inspection: `len()`

Retrieving length information about a string:

-   **`len()`** can be applied to a string to return its length:

In [15]:
len("adjective")

9

`len()` is an example of a **built-in function** which can also be
applied to other objects representing sequences

### String Inspection: `startswith()` and `endswith()`
Checking a string for its initial and final letter:
-   a **method** is a function associated with an object, and is called
    by `object.method(argument1, argument2, ...)`
-   a method call is an expression which evaluates to an object
-   **`startswith()`** and **`endswith()`** are methods which can be
    called on a string to check whether it starts or ends with a given
    string

In [17]:
word = "adjective"

In [18]:
word.startswith("a")

True

In [19]:
word.startswith("A")

False

In [22]:
word.endswith("ve")

True

In [23]:
word.endswith(word)

True

### String Inspection: `in` and `index()`

What if we are interested in all characters of a string?

-   the **`in`** operator asks whether an object is contained in a
    sequence:

In [24]:
"är" in "Lärm"

True

**`index()`** returns the position of an object inside a sequence\
(counting from 0, so **position X is after the Xth letter**)

In [None]:
"adjective".index("ctive")


using `index()` to demonstrate an error message

In [26]:
"adjective".index("stive")


ValueError: substring not found

### String Operations: Slicing Part I

To cut out a piece of a string, we use **slicing**:

-   `string[start:end]` produces a copy of the substring starting at
    position `start` and ending at `end`

In [27]:
finnish_word = "uteliaisuudessaan"


In [28]:
finnish_word[7:10]


'suu'

In [29]:
finnish_word[1:2]

't'

leaving a position empty will match until the end of the string:

In [30]:
word = "dissatisfaction"

In [31]:
word[word.index("dis")+3:]

'satisfaction'

### String Operations: Slicing Part II

More advanced features of slicing:

-   a position `-X` will be interpreted as len(string) `-X`

In [32]:
german_verb = "schreiben"

In [33]:
german_verb[-2:]

'en'

In [34]:
stem = german_verb[:-2]

In [35]:
stem

'schreib'

a third number $k$ will give you every $k$th letter:

In [36]:
cvcvcv_form = "pesukone"

In [39]:
cvcvcv_form[::2]

'pskn'

In [40]:
cvcvcv_form[1::2]

'euoe'

In [41]:
cvcvcv_form[:3:-1]

'enok'

### String Operations: `lower()` and `upper()`

Often, we want to normalize case in words and texts:

-   the string method **`lower()`** produces a lowercase copy:

In [42]:
sentence = "This is a sentence."

In [43]:
first_word = sentence[0:sentence.index(" ")]

In [44]:
normalized_word = first_word.lower()

In [45]:
normalized_word

'this'

less frequently used **`upper()`** capitalizes everything:

In [47]:
finnish_word.upper()

'UTELIAISUUDESSAAN'

-   works for all Unicode characters with case distinctions!

### String Operations: `replace()`

Many powerful transformations can be expressed as replacement:

-   the **`replace()`** method replaces all instances of its first
    argument with the second argument:

In [48]:
sws = "string with spaces"

In [53]:
sws.replace(" ", "_").replace("space", "underscore")

a third argument limits the number of replacements:

In [51]:
"string with spaces".replace(" ","_",1)

'string_with spaces'

### String Operations: Concatenation

Finally, strings can be appended to other strings:

-   for numbers, `+` means addition

In [56]:
12 + 13

25

or strings, `+` means **concatenation**

In [57]:
"from start " + "to end "

'from start to end '

In [58]:
second_singular_form = stem + "st"

In [59]:
"du " + second_singular_form

'du schreibst'

Objects and Basic Datatypes
===========================

Objects and Memory
------------------

Objects and Memory: Basics Relation between objects and memory
(simplified):

- every object in a running Python process is represented in the
  computer's memory by some bit pattern (example: `"hi"`):

![image.png](attachment:image.png)



- for each variable, Python keeps track of the memory address at which
  the current value is stored, and reads it from there when necessary

- each object occupies a certain amount of memory ($\approx$ space)

- in practice, this limits the number of objects you can have access
  to at the same time (for string data: about 1 billion characters per
  GB)

- creating a new object forces Python to allocate some memory to it;   if Python runs out of memory, it asks the operating system for more


# Objects and Memory: Deallocation

How memory occupied by objects is freed up again:

-   a **reference counter** for each object counts how many variables
    and other objects point to the object

-   once the reference counter reaches 0, the object is inaccessible

-   inaccessible objects are destroyed immediately, and the memory they
    were stored in is deallocated (marked for reuse)

-   problem: objects can reference each other (creating cycles), these
    are resolved by a **garbage collector** which is called periodically

-   eventually, Python might hand over control of part of its memory
    back to the operating system if it doesn't need it any more


What is a Data Type?
--------------------
### What is a Data Type?
The **data type** of an object

- tells Python how to interpret the object's representation in memory
- determines which built-in methods can be called on the object (e.g. `replace()` and `lower()` on strings)
- **`type()`** returns the data type of an object:


In [60]:
type("hi")

str

In [61]:
type(print)

builtin_function_or_method

- already known: string type (from today)
- additional basic data types (numbers and truth values): now
- complex data types (lists, tuples, dictionaries): Sessions 05/06
- custom types (classes): Session 08

### Numeric Data Types: Integers

The type **`int`** represents positive or negative integers
(*Ganzzahlen*):
- optional prefix `+` for positive, mandatory `-` for negative integers
- work just like strings as arguments to the `print` function:


In [62]:
sum = 34 + 67 - 12 + -18

In [63]:
print(sum)

71


- important: no support for thousands separators! (neither English-style commas nor rest-of-the-world-style points)

In [64]:
large_int_en = 100,000,000
print(large_int_en)

(100, 0, 0)


In [65]:
type(large_int_en)

tuple

In [68]:
large_int = 100.000.000

SyntaxError: invalid syntax (2290096579.py, line 1)

### Numeric Data Types: Floating Point Numbers

The type **`float`** represents real numbers as floating point numbers
(*Fließkommazahlen*) of limited (albeit very high) precision:

- the point (`.`) is used as the decimal separator in programming:

In [69]:
quotient = 2.5/5
print(quotient)

0.5


In [77]:
round(1.5)

2

- once in the realm of floats, numbers will never evaluate as
  integers, even if they could be represented as such

In [70]:
2.5/2.5

1.0

In [71]:
1.0*2

2.0

- function **`round()`** can be used to convert floats back into
  integers:


In [72]:
round(-1.3)

-1

In [73]:
round(7.5/2.5)

3

Boolean Data Type
-----------------

### The Boolean Data Type

The type **`bool`** represents truth values:

- there are only two objects of type `bool`:

  -   **`True`** represent "yes"

  -   **`False`** represents "no"

- many functions and methods which answer yes-no-questions\
  (naming convention: `is...()`) do so by returning Boolean values:

In [80]:
"UGA".isupper()

True

In [82]:
"uga".islower()

True

- Boolean objects can be stored in variables like any object:

In [83]:
is_lowercase = "UGA".islower()
print(is_l owercase)

False


### Typecasting

Sometimes it is necessary to convert between types using
**typecasting**:

- trying to add an integer to a string leads to this:

In [84]:
print("The result is " + 24 + ".")

TypeError: can only concatenate str (not "int") to str

- **`str()`** converts other types to strings:

In [85]:
print("The result is " + str(24) + ".")

The result is 24.


- **`int()`** and **`float()`** generate number objects from string
  representations, but throw a `ValueError` if the format is wrong:


In [87]:
int("24") + 2

26

In [89]:
float("34.5") + 2

36.5