<div>
<img src="img/python_logo.png" width="100px" style="float: left; "/> 
<div style="font-size: 40px; padding-top: 20px">Python basics</div>   
<div style="font-size: 30px; padding-top: 20px">Part 1 - Objects</div>   
</div>

 


<h2 style="clear: both">COMM4190 Spring 2025</h2>

<h3>Instructor: Matt O'Donnell (mbod@asc.upenn.edu)</h3>

-----

<div class="alert alert-info">

## Overview

* This notebook will cover:
    1. __OBJECTS__ which are a core concept for programming Python, where an __object__ holds data of different _types_ and a group of actions (*functions*) that can be applied to these data
        * __Data types__ - data comes in different kinds/flavors, e.g. numbers, fractions, proportions, categories, text, etc., that are best represented by different objects (e.g. `str`, `int`, `float`). 
    2. __String objects__ - to work with text data we will most often use the _string_ data type in a `str` object.
        * __Error messages__ - when the code is not quite right you will receive an _error message_ when you run a code cell. THIS WILL HAPPEN A LOT! But trying to understand the error message can help you figure out how to fix your code.
        * __Names__ or __Named Pointers__ - labels that can be attached to objects to make it possible reference the object in subsequent steps
        * __String functions__
    3. __String Indexing and Slicing__
    
</div>
    
------

## 1. Objects


* __Objects are _things_ that have _properties_ (DATA) and related _actions_ they can do (FUNCTIONS)__. 


* An object in a programming language like Python, is a way to represent or model THINGS in the real world. Things tend to have features or properties (shape, size, weight, number of legs, and so on) and actions they are involved in (bouncing, rolling, walking, running, being filled, and so on).


* For example, in _Goldilocks and the Three Bears_ the following things can be modeled as objects.

<div style="text-align:center">
<img src="img/goldilocks_miyako.png" />
<div style="font-size: 8px; text-align: center; line-height: 1.2; margin-bottom: 20px;">
    Illustration by Miya Huang<br/>
    https://miyako945.wixsite.com/artofmiya/portfolio?lightbox=dataItem-jzl1fmjg2
</div>
    
</div>

|  OBJECT | PROPERTIES     | FUNCTIONS |
| :-----: | :--------:     | :-------: |
|  __table__  | legs (4), top      | support, fold      |
|  __bowl__   | size, weight, full/empty | be filled/emptied, be washed |
|  __chair__  | legs (4), seat, arms | be sat/stood on |
|  __bed__    | legs (4), mattress   | be slept in |
|  __girl__   | legs (2), arms, hands, feet, head, name, hair, age | walk, eat, think, sleep |
|  __bear__   | legs (4), paws, fur, eyes | run, climb, eat, sleep, speak(?) |


* Objects can also referred to as __CLASSES__ in some contexts.

### Data types

* Objects can hold pieces of data (e.g. numbers, text, logical (`True/False`) values) and combinations and sequences of data.


* The `TYPE` of an object provides structure in terms of the properties and functions.


* Python takes care of tracking and worrying about the type of an object (i.e. the __data types__) most of the time. So it is easier to work with than some other programming languages that need you to always be aware of the data type. But there are places where it is important and you might come across a `TypeError`.

## 2. _String_ (`str`) objects  

* A string in Python is a way to represent a word or a piece of text data.


* A string is a _sequence_ (or list) of characters.


* To define a string in Python you use a pair of __matching__ quotes (either single or double), e.g.

In [None]:
"this is a string object"

In [None]:
'this is also a string object'

* **NOTE** that both single `'` and double `"` quotes can be used to mark the beginning and end of the text data. __BUT__ they must match 

### 2.1. General functions

* There are some _general functions_ in Python that can be used on most objects.


* The syntax for these functions is:
    * `function_name(object)` <br/><br/>



* Some of the must commonly used include:
    * `print` - display a represenation of the data
    * `len` - the size (length) of the data held by the object
    * `type` - show what kind of object this is
    * `dir` - list the features and functions/methods of the object


* For example, the `print(obj)` function will display a representation of the data contained in an object

In [None]:
print("this is a string object")

* And the `len(obj)` function will return a data type specific indication of the size of the data in an object.

* For a `str` object this is the number of _characters_:

In [None]:
len("this is a string object")

* There are `23` characters in the string object `this is a string object`

* The `type(obj)` general function will return the data type of the `obj` given as the argument

In [None]:
type("this is a string object")

In [None]:
type(11)

In [None]:
type(11.2)

### 2.2. ERRORS!

* You __ARE__ going to frequently encounter error messages as you work on learning Python and doing data analysis in notebooks.


* This is __NOT__ something to be concerned about. 


* They do not mean you got something wrong... they just indicate you haven't got it quite right __YET__! 


* And the content of the error message is supposed to help you figure out how to fix it!

In [None]:
"this is a string'

* Notice that the start and end quote characters do not match

* This produces a `SyntaxError`

* Some of the operators (like addition `+` and substraction `-`) that we will learn will only work with certain `types`


* For instance, you can add two numeric types together:

In [None]:
# two int objects

1+15

In [None]:
# a float and an int

15.1 - 2303

* But if you try and add a number and string object you will see a `TypeError`

In [None]:
# an int and a str object

1+'abc'

### 2.3. Names or Named pointers

* We can define a named pointer to link to an object so that we can refer to it in other parts of our code.


* The syntax for this is:
    * `pointer_name = OBJECT`


* For example, the following code cell create a `str` object with the first sentence of _Goldilocks and the Three Bears_ and gives this object the __name__ `sent1`:

In [None]:
sent1='Once upon a time, there was a little girl named Goldilocks.'

* Now we can get to (or reference) the string object using the pointer/name `sent1`

In [None]:
print(sent1)

In [None]:
len(sent1)

In [None]:
type(sent1)

* __NOTE__ if you try and use a named pointer that hasn't been defined you will get a `NameError`

In [None]:
print(sent2)

### 2.4. String functions

* Functions (or methods) that are <u>specific to an object</u> can be applied to an instance of an object using the _dot notation_ where the syntax is:
    * `object.function()`
    
    
* For example, applying the `upper()` function on a string object will transform all the characters in the string into uppercase form:

In [None]:
sent1.upper()

* In this example:
    * `sent1` is a named pointer that references a `str` object
    * The data in that string object is:
        * `'Once upon a time, there was a little girl named Goldilocks.'`
    * So three things are happening when you run `sent1.upper()`:
        1. Python finds the named pointer `sent1`
        2. and follows it to the `str` object
        3. then it executes that object's `.upper()` function and returns the result (which is a new `str` object)
        

* Similarly, the `lower()` function transforms all the characters in a string into lower case.

In [None]:
sent1.lower()

* And the `title()` function makes the first character of each 'word' (determined by whitespace) an uppercase character.

In [None]:
sent1.title()

* __NOTE__ that applying these functions to the object pointed to by `sent1` does not change the object itself.


* If we follow the pointer `sent1` to the string object, we see the original form of the string is still there: 

In [None]:
sent1

* So we can use another pointer to keep track of the result

In [None]:
sent1_UC = sent1.upper()

In [None]:
print('sent1 points to =>', sent1)
print()
print('sent1_UC points to =>', sent1_UC)

In [None]:
sent2='She went for a walk in the forest.'
sent3='Pretty soon, she came upon a house.'
sent4='She knocked and, when no one answered, she walked right in.'

* Some of the specific string object functions are __LOGICAL TESTS__ that will test whether certain things are TRUE or FALSE about the data.


* For example, the `.startswith(test_str)` function compares the beginning of the string data in a string object with the `test_str` and returns `True` if it matches and `False` if it does not. 

In [None]:
sent2.startswith('S')

* First, use the name `sent2` to get to the `str` object:
    ```
    'She went for a walk in the forest.'
    ```
* Then apply the string specific function `.startswith('S')` to this object to test whether the data begins with an uppercase `S`.

* It does so the returned result is `True`

---

* The `.startswith()` function can take more than one character as the `str` __argument__.


In [None]:
sent2.startswith('She')

* Here we tested to see if the data in the `str` object referenced (pointed to) by the name `sent2` begins with the three characters `S` followed by `h` followed by `e`


* __NOTE__ that this function is _case sensitive_ so you will get a different result testing `She` and `she`

In [None]:
sent2.startswith('she')

* The `.endswith()` string specific function does something similar but starts from the end of the data in a `str` object

In [None]:
sent3.endswith('!')

* `sent3` points to a `str` object with data
   ```
    'Pretty soon, she came upon a house.'
   ```

* The last character is a `.` and not `!` so the test returns `False`


----

* Another string specific function that works as a logical test on the data is `.islower()`. As you might imagine it looks at all the characters and returns `True` if __ALL OF THEM__ are in lowercase otherwise it returns `False`.

In [None]:
sent4.islower()

In [None]:
# take a look at the data in the str
# object referenced by sent4

sent4

* The first character is uppercase so the `.islower()` test returns `False`


* Remember we can transform the case of characters in a `str` object using `.islower()`

In [None]:
sent4.lower()

* And then we could "chain" the two functions toghether like this:

In [None]:
sent4.lower().islower()

* First use the `sent4` named pointer to reference the `str` object

* Then apply the `.lower()` string specific function to transform the data to ALL LOWERCASE. This results in:
    ```
    'she knocked and, when no one answered, she walked right in.'
    ```

* Then apply the `.islower()` test to the result

----

#### The `dir()` and `help()` functions

* The `dir()` function will list the functions that are specific to a specific class. You do not often need to use it or worry about the details at this stage. And you can ignore the items beginning with the double underscores for now.


* But if you look down the list you will see some of the things that can be done with string objects.

In [None]:
dir(str)

In [None]:
help(str.strip)

* We can test out the `.strip()` function

In [None]:
"    Once upon a time   ".strip()

### 2.5. An example to illustrate using `str` objects and named pointers

In [None]:
'BEARS!'

* the `'` quotes around the output indicate that it is a string type object
* using the `print()` function gives a prettier formatted version of the object (including any layout characters like line feeds etc)

In [None]:
print('BEARS!')

* we can apply functions to this string

In [None]:
len('BEARS!')

In [None]:
'BEARS!'.lower()

* BUT each time we retype it we are creating another string object and after the line of code has been executed we have no way of getting back to that same object
* It is 'orphaned' in the Python memory workspace!

![](img/pointer1.png)

* when we do something to it like `replace` the exclamation point with an empty character (`''`)
* the result is also a new object that is orphaned 

In [None]:
'BEARS!'.replace('!','')

![](img/pointer2.png)

* So we can assign a named pointer to the string object and then we have a way to get back to it...

In [None]:
text = 'BEARS!'

![](img/pointer3.png)

In [None]:
print(text)

In [None]:
print(text*5)

In [None]:
print(text.lower())

* **BUT** notice if we do something the object pointed to by the name `text` it doesn't change the object

In [None]:
print("This is result of calling replace('!','') on text >>>", text.replace('!',''))
print()
print("The current object pointed to by text is >>>", text)

* **SO** to get the behavior we might have expected, that is, that `text` would point to the result of stripping the `!` we need to reassign the pointer:

In [None]:
text = text.replace('!','')

* this does the following:

![](img/pointer5.png)

In [None]:
print("The current object pointed to by text is >>>", text)

In [None]:
text2 = 'BEARS!'

In [None]:
print('text >>> ', text)
print()
print('text2 >>> ', text2)

![](img/pointer7.png)

In [None]:
text2=text.lower()

In [None]:
print('text >>> ', text)
print()
print('text2 >>> ', text2)

![](img/pointer8.png)

## 3. Indexing and slicing of a `string`

### 3.1. Indexing

* A string is a sequence (or list) of characters so we can refer to specific characters in the string using **INDEXING**
* In Python indexes begin at **ZERO**/**0**
* So `bear[1]` is the character `e` and *not* `b` which is `bear[0]`

In [None]:
print('bear'[1])

* indexing the `str` object `bear` with index 1 will return __THE SECOND__ character `e`

In [None]:
print('bear'[0])

* We use index `0` to get the first character 


* We can do the same using a named pointer

In [None]:
word = "Bear!"

In [None]:
# index 1 -> 2nd character

word[1]

In [None]:
# index 3 -> 4th character 

word[3]

* We can also go backwards from the end of the string using __negative indices__:

In [None]:
word[-1]

![](img/string-indexing.png)

* __NOTE__ if you try use an index beyond the length of the data you'll get an `IndexError`

In [None]:
len(word)

* `word` references a `str` object with 5 characters
    ```
    'Bear!'
    ```
    
* So there are five indices:
    ```
    0, 1, 2, 3, 4
    ```
    
* If we try and retrieve the 6th characer (index `5`)

In [None]:
word[5]

* we get an `IndexError`

### 3.2. Slicing
* A **SLICE** is a contiguous sequence of characters in a string

In [None]:
word[0:3]

* The start index is **inclusive** and the end index is **exclusive**
* But better to understand the indexes as points **BEFORE** a character just like in the figure above

In [None]:
# remember sent1
sent1

* Let's take a slice of the first 18 characters

In [None]:
sent1[0:18]