# <font color = firebrick>Tutorial 5: Slicing and Functions </font><a id='home'></a>
    
In this tutorial, we continue to explore Python fundamentals, focusing on the essential skills needed for digital economics and data analysis. By the end, you’ll be able to work with data slices effectively, a key technique for handling user-generated data or breaking down large datasets. We’ll also introduce Python functions, which will help us streamline data processing in tasks such as web scraping or social media analytics.

Topics covered:
    
1. [Slicing](#subsets)
2. [Functions](#functions)
3. [Objects and TAB completion](#tab)



# Slicing <a id="subsets"></a> ([top](#home))
Slicing allows us to extract specific portions of data from lists, tuples, or strings. This is especially useful in digital economics when dealing with data such as time-series of website visits or breaking down sections of a large social media dataset. Python’s slicing syntax uses square brackets `[]` and works similarly across various data structures:

```python
data[start:stop:stride]
```
- `start`: The starting index (inclusive)
- `stop`: The ending index (exclusive)
- `stride`: Step size, or the number of elements to skip

For example, we could use slicing to analyze the most recent website visitors or get every second data point in a user engagement log. The default step is 1, meaning it takes every element between start and stop by default.

In [23]:
user_engagement = [105, 204, 304, 404, 505]

print(user_engagement, '# Initial list') 
print(user_engagement[0:2])   # First two records
print(user_engagement[0:4:1]) # Same as above with an explicit step of 1
print(user_engagement[0:5:2]) # Every second record

[105, 204, 304, 404, 505] # Initial list
[105, 204]
[105, 204, 304, 404]
[105, 304, 505]


To slice from the beginning or to the end, omit the start or stop argument.

In [24]:
print(user_engagement, '# Initial list') 
print(user_engagement[2:])    # From third to last record
print(user_engagement[:4])    # First four records

[105, 204, 304, 404, 505] # Initial list
[304, 404, 505]
[105, 204, 304, 404]


### Splitting data with slices
Slicing helps when splitting data into segments, such as dividing user engagement metrics into two halves for comparison.

In [29]:
print(user_engagement, '# Initial list') 
first_half = user_engagement[:3]
second_half = user_engagement[3:]
print(first_half, '\n', second_half)

[105, 204, 304, 404, 505] # Initial list
[105, 204, 304] 
 [404, 505]


### Slicing with negative indexes
Negative indexing helps when you need the last few data points, such as the most recent user activity entries.

In [30]:
print(user_engagement, '# Initial list') 
print(user_engagement[:-1])    # All but the last entry
print(user_engagement[:-2])    # All but the last two entries
print(user_engagement[-4:-2])  # Entries from the fourth-last to second-last

[105, 204, 304, 404, 505] # Initial list
[105, 204, 304, 404]
[105, 204, 304]
[204, 304]


### Using a negative step
A negative stride lets us iterate in reverse, useful for reversing ordered data like chronological web activity.

In [31]:
print(user_engagement, '# Initial list') 
print(user_engagement[::-1])   # Reverses the list
print(user_engagement[4:1:-1]) # Starts at index 4, goes back to index 2

[105, 204, 304, 404, 505] # Initial list
[505, 404, 304, 204, 105]
[505, 404, 304]


Slicing also works for strings, such as extracting hashtags from a social media post:

In [32]:
tagline = '#digitalon'
print(tagline[:2])        # Gets the '#'
print(tagline[::-1])      # Reverses the hashtag

#d
nolatigid#


## <font color='red'>Practice</font>

Try these exercises to solidify your understanding of slicing:

1. Create a variable `trend = 'Tech Economy'`
2. Slice `trend` to create the variables `category` and `focus`
3. Use negative indexing to create `category_ne` and `focus_neg` from `trend`

Now try with a sorted engagement dataset:

In [None]:
engagement_data = [125, 300, 800, 1200, 2000]

4. Print out the 3 largest engagement values
5. Print out the 2 smallest engagement values

# 2. Functions<a id="functions"></a> ([top](#home))
In Python, we can create our own functions just like many other languages allow. Using functions helps us avoid redundancy by writing code once and reusing it multiple times, which is especially useful when working with data. Here's a simple example:

In [33]:
def engagements_to_reach(engagements):
    """
    Input an engagement count on a social media post. Estimate reach based on an average engagement rate of 5%.
    """
    reach = engagements / 0.05  # Assume 5% engagement rate
    return reach  # this is the value the function returns

After running the cell above, the function `engagements_to_reach` is now available to use. We can use the `whos` statement to list all objects currently in memory, including this new function. \[A namespace is a list of all the objects we have created and the names we have assigned them.\]

In [34]:
whos

Variable               Type        Data/Info
--------------------------------------------
engagements_to_reach   function    <function engagements_to_reach at 0x1075bcf40>
estimated_reach        float       5000.0
first_half             list        n=3
first_part             list        n=3
post_engagements       int         250
second_half            list        n=2
second_part            list        n=2
slogan                 str         onward
some_list              list        n=5
tagline                str         #digitalon
user_engagement        list        n=5


We can see the function `engagements_to_reach` listed as an object in our namespace. Now let's try using it to estimate the reach of a post based on engagements.

In [35]:
post_engagements = 250  # total likes, comments, shares
estimated_reach = engagements_to_reach(post_engagements)
print('The estimated reach of the post is', estimated_reach, 'users.')

The estimated reach of the post is 5000.0 users.


It’s good practice to check for potential issues in our functions to make them more robust.

In [36]:
post_engagements = '250'  # Engagements should be a number, but here it’s a string
estimated_reach = engagements_to_reach(post_engagements)
print('The estimated reach of the post is', estimated_reach, 'users.')

TypeError: unsupported operand type(s) for /: 'str' and 'float'

In [37]:
def engagements_to_reach_v2(engagements):
    """
    Input an engagement count on a social media post. Estimate reach based on an average engagement rate of 5%.
    """
    if isinstance(engagements, (int, float)):
        reach = engagements / 0.05
        return reach
    else:
        print('error: engagements_to_reach_v2 expects a number as input.')
        return None

In [38]:
post_engagements = '250'
estimated_reach = engagements_to_reach_v2(post_engagements)
print('The estimated reach of the post is', estimated_reach, 'users.')

error: engagements_to_reach_v2 expects a number as input.
The estimated reach of the post is None users.


When writing code, it’s a balance between how much time we invest in handling errors and how robust our code needs to be. Now, let’s explore functions with multiple input variables.

In [None]:
def user_name_formatter(first, handle, last):
    """
    Formats a user's full name and handle (e.g., for a profile display).
    """
    return first.title() + ' ' + last.title() + ' (' + handle.lower() + ')'

In [None]:
first_name = 'ALex'
handle = '@aLexCode'
last_name = 'johnson'
formatted_name = user_name_formatter(first_name, handle, last_name)
print(formatted_name)

**Important:** We can assign multiple return values using multiple assignment. This allows us to return more than one result from a function. Let's see how.

In [None]:
user_id, post_count = 12345, 98
print(user_id, post_count)

Multiple assignment also makes it easier to swap values. For example, swapping two digital IDs can be done in one line:

In [None]:
id_a = 1001
id_b = 2002
print('Before swap: id_a=', id_a, 'id_b=', id_b)
id_a, id_b = id_b, id_a
print('After swap: id_a=', id_a, 'id_b=', id_b)

Multiple assignment can be useful for functions that need to return multiple values. Here’s an example with a function that calculates both the bounce rate and conversion rate of a web page.

In [None]:
def web_metrics(visits, bounces, conversions):
    """
    Calculate the bounce rate and conversion rate based on web traffic data.
    """
    bounce_rate = (bounces / visits) * 100
    conversion_rate = (conversions / visits) * 100
    return bounce_rate, conversion_rate
page_visits = 500
page_bounces = 150
page_conversions = 50
bounce, conversion = web_metrics(page_visits, page_bounces, page_conversions)
print(f"Bounce Rate: {bounce:.2f}% | Conversion Rate: {conversion:.2f}%")

## <font color='red'> Practice</font>
Take a few minutes and try the following:
1. Write a function to calculate the average watch time per user on a streaming platform. Pass in total minutes watched and number of users. Test it with 1500 total minutes and 100 users.

2. Modify the `user_name_formatter()` function to return both the formatted name and the character count (without spaces). Use multiple assignment.

3. The `split(delim)` string method is useful for breaking up a string into sub-strings, such as hashtags in a social media post. The argument `delim` specifies the delimiter. Example:

In [None]:
post_text = 'Check out #DigitalEconomy #TechTrends #FutureOfWork'
hashtags = post_text.split('#')  # Use hashtag as the delimiter
print(hashtags)

# 3. Objects and TAB completion <a id="tab"></a> ([top](#home))
Python, like C++ or JavaScript, is an object-oriented language. While a computer science course could spend weeks on object-oriented programming, our goal here is to understand objects well enough to use them effectively.

In Python, *everything* is an object. Variables, functions, lists, and even strings are all objects. This is useful because objects come with built-in **attributes** and **methods** that allow us to interact with them in powerful ways. The specific attributes and methods an object has depend on its *type*. Let's look at lists, for example:

```python
list_1 = ['a', 'b', 'c']
list_2 = [4, 5, 6, 7, 8]
```

Both `list_1` and `list_2` are objects of type `list`, but their **attributes** may vary. For instance, list length is an attribute: `list_1` has length 3, while `list_2` has length 5.

**Methods** are like functions that are built into an object. Each object type has different methods available, which we access using 'dot' notation. For example:

```python
list_1.method()
```

In this case, `method()` is a method tied to the list type. We've already used methods like `lower()`, `upper()`, and `title()` with strings.

In [None]:
list_1 = ['a', 'c', 'b']
print(list_1)

In [None]:
list_1.sort()        # Using the sort() method from the 'list' type
print(list_1)

### Finding methods
How do we find out what methods are available for an object? A quick web search can help, but there's also a useful feature in Jupyter: **TAB completion**.

To try it, type `list_1.` in the cell below and press the TAB key.

Pressing TAB will show a list of available methods. For example, `append()` and `reverse()` are useful list methods. Let's try `reverse()`:

In [None]:
list_1.reverse()
print(list_1)

TAB completion also helps with variable names. Type `lis` and press TAB, and you'll see all variables in your current namespace that start with `lis`. This is especially handy for avoiding typos and saving time!

## <font color='firebrick'>Practice</font>

Take a few minutes to try the following. Work with those around you if you need help:

1. Given `gdp = '18,570.50'`, convert it to a float. Use TAB completion (and Google, if needed) to find a method that removes the comma.


2. Sort the list below, then use TAB completion (`.`) and the object inspector (`?`) to insert `new_score` into the list in the correct position so it stays sorted.