
## Programming for Data Science

### Lecture 2: Data Structures, Part 1

### Instructor: Farhad Pourkamali 


[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/farhad-pourkamali/CUSucceedProgrammingForDataScience/blob/main/Lecture2_DataStructures_Part1.ipynb)


### Introduction
<hr style="border:2px solid gray">

* In Python, a `variable` is a symbolic name that represents or refers to a value. It is like a container that holds information.
    * You can think of it as a label for a piece of data stored in the computer's memory.

* A `data structure`, on the other hand, is a way of organizing and storing data so that it can be used efficiently. 
    * Python provides several built-in data structures, such as
        * lists
        * tuples
        * sets
        * dictionaries
        
* These structures allow you to organize and manipulate data in different ways based on your needs.
    * For example, a list is an ordered collection of items that can be of different types, and a dictionary is an unordered collection of key-value pairs. 

In [1]:
# Use the assignment operator
# The value to the right of the "equal sign" 
# is assigned to the variable on the left of the equal sign
# Like x <- 2

x = 2

x

2

In [2]:
y = 3 + 2

y

5

In [3]:
# List all the variables in the notebook

%whos

Variable   Type    Data/Info
----------------------------
x          int     2
y          int     5


* When naming variables in Python, it's important to follow some general rules to write clean and readable code.
    * Variable names must begin with a letter (a-z, A-Z) or an underscore (_). It's a convention to use underscores for multi-word variable names (snake_case).
    * Don't use Python keywords or built-in function names as variable names. For example, avoid names like type, if, else, for, and so on.
    * Conventionally, variable names are written in lowercase letters. This is known as snake_case and is widely adopted in the Python community.
    * A variable name in Python cannot start with a number. 

In [4]:
age = 25
user_name = "Christina"
total_score = 100
is_valid_input = True

In [5]:
my_variable1 = 2

my_variable1 

2

In [6]:
1variable = 3

SyntaxError: invalid decimal literal (752939674.py, line 1)

In [7]:
_variable = 2

_variable

2

* In Python, the fundamental data types include:

    * Integers (`int`): Represents whole numbers without any decimal points.

    * Floating-point numbers (`float`): Represents numbers with decimal points.

    * Strings (`str`): Represents sequences of characters, such as text.

    * Booleans (`bool`): Represents either True or False values.

    * NoneType (`None`): Represents the absence of a value or a null value.

In [8]:
x = 5

type(x)

int

In [9]:
a = 3.14

type(a)

float

In [10]:
# Strings are enclosed in either single (') or double (") quotes. 

name = "John"

type(name)

str

The `len` function in Python is used to get the length or the number of items in an object. When working with strings, `len` returns the number of characters in the string.

In [11]:
my_string = "Hello, World!"

length = len(my_string)

length 

13

* Strings also have indices that enable us to determine the location of each character.

In [12]:
my_string = "Python"
first_character = my_string[0]  # accessing the first character using index
second_character = my_string[1]  # accessing the second character using index

print(f"The first character is: {first_character}")
print(f"The second character is: {second_character}")

The first character is: P
The second character is: y


* The `print(f"")` syntax is called an f-string, which stands for "formatted string literal." It's a convenient way to embed expressions or variables within a string in a concise and readable manner.
    * Inside the string, you can use curly braces `{}` to enclose expressions or variables.

* In Python, `slicing` is a technique used to extract a portion of a sequence, such as a string, list, or tuple. The syntax for slicing is `[start:end:step]`, where:

    * `start` is the index of the first element you want.
    * `end` is the index of the element just after the last element you want.
    * `step` is the number of indices between each slice (optional, default is 1).
    
* The slicing is inclusive on the left side (at the start index) and exclusive on the right side (up to, but not including, the end index).

In [13]:
my_string

'Python'

In [14]:
my_string[1:4]

'yth'

In [15]:
my_string[1:4:1]

'yth'

In [16]:
my_string[1:4:2]

'yh'

* If you don't specify values for `start`, `end`, and `step` in a slicing operation, Python will use default values. The default values are as follows:

    * `start` defaults to the beginning of the sequence.
    * `end` defaults to the end of the sequence.
    * `step` defaults to 1.

In [17]:
my_string[:4]

'Pyth'

In [18]:
my_string[1:]

'ython'

* When slicing strings, you have the option to use negative indices, indicating counting from the end of the string. For instance, -1 represents the last character, -2 denotes the second-to-last character, and so forth.

In [19]:
my_string

'Python'

In [20]:
my_string[2:-2]

'th'

In [21]:
my_string = "Python"

# Slicing with a negative step (-1) to pass through the sequence 
# in reverse

my_string[::-1]

'nohtyP'

* In Python, a string is an object equipped with various methods for manipulation (a concept known as object oriented programming, which we'll explore later). To utilize these methods, follow the pattern `my_string.method_name`.

In [22]:
# Turn my_string to upper case
my_string.upper()

'PYTHON'

In [23]:
my_string

'Python'

In [24]:
my_string.lower()

'python'

In [25]:
# Checks if the string starts with a specified prefix

my_string.startswith("P")

True

In [26]:
# Finds the index of the first occurrence of a substring

my_string.find("th")

2

In [27]:
my_string

'Python'

### List
<hr style="border:2px solid gray">

* Here are a few key points about lists in Python:

    * Ordered Collection: Lists are ordered collections of items, and each item has an index that denotes its position in the list.

    * Mutable: Lists are mutable, which means you can modify their elements by assigning new values, adding or removing items.

    * Heterogeneous Elements: A list can contain elements of different data types, including numbers, strings, or even other lists.

    * Defined using Square Brackets: Lists are defined using square brackets `[]`. 


    * Common Operations: Lists support common operations like concatenation (+), repetition (*), and the len() function.

    * Methods: Lists come with a variety of built-in methods for operations like adding, removing, sorting, and more.

    * Dynamic Size: Lists in Python can grow or shrink dynamically. You can add or remove elements as needed.

In [28]:
list_1 = [1, 2, 3]

list_1

[1, 2, 3]

In [29]:
type(list_1)

list

In [30]:
list_2 = ["Apple", "Orange"]

list_2

['Apple', 'Orange']

In [31]:
# Concatenation
list_3 = list_1 + list_2

list_3

[1, 2, 3, 'Apple', 'Orange']

In [32]:
len(list_3)

5

In [33]:
# How about this one?

list_4 = [list_1, list_2]

list_4

[[1, 2, 3], ['Apple', 'Orange']]

In [34]:
len(list_4)

2

In [35]:
list_3

[1, 2, 3, 'Apple', 'Orange']

In [36]:
list_3[:2]

[1, 2]

In [37]:
list_3[-1]

'Orange'

* New items can be added to an existing list by using the `append` method from the list.

In [38]:
list_3

[1, 2, 3, 'Apple', 'Orange']

In [39]:
list_3.append('4')

list_3

[1, 2, 3, 'Apple', 'Orange', '4']

* You have the option to create an `empty list` initially and then add new elements to it later using the `append` method.

In [40]:
my_list = []  # Creating an empty list

my_list.append(1)  # Adding the element 1 to the list

my_list.append("two")  # Adding the string "two" to the list

my_list

[1, 'two']

* The `insert` method is commonly used for modifying lists in Python.

    * The syntax is `insert(index, element)`

        * The insert method is used to insert an element at a specific index in the list.
        * The existing elements at and after the specified index are shifted to accommodate the new element.

In [41]:
my_list.insert(1, "three")

my_list

[1, 'three', 'two']

In [42]:
# We can check if an element is in the list using the operator "in"

"three" in my_list

True

In [43]:
"four" in my_list

False

### Tuple
<hr style="border:2px solid gray">

* Here are some key points about tuples in Python:

    * Ordered Collection: Tuples are ordered collections of elements, similar to lists, and each element has an index.

    * Immutable: Tuples are immutable, meaning once they are created, their elements cannot be modified or changed. However, you can create a new tuple with modifications.

    * Heterogeneous Elements: Like lists, tuples can contain elements of different data types.

    * Defined using Parentheses: Tuples are defined using parentheses `()`. 

    * Common Operations: Tuples support common operations like concatenation (+), repetition (*), and the len() function.

    * Single Element Tuple: A tuple with a single element must have a trailing comma. For example: `single_element_tuple = (42,)`

    * Packing and Unpacking: Tuples support packing multiple values into a single tuple and unpacking values into multiple variables.

In [44]:
# This is NOT a tuple

single_element_tuple = (42)

type(single_element_tuple)

int

In [45]:
# This is a tuple

single_element_tuple = (42,)

type(single_element_tuple)

tuple

In [46]:
# Another example 

my_tuple = (2, 4, 6)

my_tuple

(2, 4, 6)

In [47]:
type(my_tuple)

tuple

In [48]:
# Immutable

my_tuple[2] = 8

TypeError: 'tuple' object does not support item assignment

In [49]:
len(my_tuple)

3

* `Packing` refers to the process of combining multiple values into a single tuple.
You create a tuple by grouping values together, and Python automatically packs them into a tuple.

In [50]:
packed_tuple = 1, 2, "three"

packed_tuple

(1, 2, 'three')

In [51]:
type(packed_tuple)

tuple

* `Unpacking` is the opposite of packing. It involves extracting values from a tuple and assigning them to individual variables.

In [52]:
my_tuple = (1, 2, "three")

a, b, c = my_tuple

print(a, b, c)

1 2 three


In [53]:
type(a)

int

In [54]:
my_tuple = (1, 2, 3, 4, 5)

# Unpacking with the * (asterisk) operator
first, *rest = my_tuple # capture multiple elements into a single variable

print(first)  # Output: 1
print(rest)   # Output: [2, 3, 4, 5]

1
[2, 3, 4, 5]


* The `zip` function in Python is used to combine elements from two or more iterables (such as lists) and create tuples from corresponding elements. It pairs elements at the same position in each iterable, creating tuples containing elements from the input iterables.

In [55]:
list1 = [1, 2, 3]
list2 = ['a', 'b', 'c']

# Using zip to create tuples from two lists
zipped_tuples = zip(list1, list2)

# Converting the result to a list for better visibility
result_list = list(zipped_tuples)

print(result_list)

[(1, 'a'), (2, 'b'), (3, 'c')]


### Set
<hr style="border:2px solid gray">

* Here are some key points about sets in Python:

    * Unordered Collection: Sets are an unordered collection of unique elements. Each element in a set must be unique.

    * Mutable: Sets are mutable, meaning you can add and remove elements after the set is created.

    * No Duplicate Elements: Sets automatically enforce uniqueness, so duplicate elements are not allowed.

    * Defined using Curly Braces: Sets are defined using curly braces `{}`, but without key-value pairs.


    * No Indexing: Sets do not support indexing or slicing because they are unordered.

In [56]:
my_set = {1, 2, 3, 4}

my_set.add(5)

my_set

{1, 2, 3, 4, 5}

In [57]:
len(my_set)

5

In [58]:
my_set.remove(2)

my_set

{1, 3, 4, 5}

In [59]:
my_set.add(1)

my_set

{1, 3, 4, 5}

In [60]:
my_set[2]

TypeError: 'set' object is not subscriptable

### Dictionary
<hr style="border:2px solid gray">

* Here are some key points about dictionaries in Python:

    * Unordered Collection: Dictionaries are an unordered collection of key-value pairs. Each key must be unique within a dictionary.

    * Mutable: Dictionaries are mutable, meaning you can add, modify, or remove key-value pairs after the dictionary is created.

    * Key-Value Pairs: Each item in a dictionary consists of a key and its corresponding value, separated by a colon. For example: `my_dict = {'name': 'John', 'age': 25}`

    * Defined using Curly Braces: Dictionaries are defined using curly braces `{}`.

    * Accessing Values: Values in a dictionary are accessed using their respective keys.

    * Common Operations: Dictionaries support common operations like adding items (`my_dict['key'] = 'value'`) and checking for the existence of keys (`'key' in my_dict`).

In [61]:
my_dict = {'name': 'John', 'age': 25, 'city': 'New York'}

print(my_dict)

{'name': 'John', 'age': 25, 'city': 'New York'}


In [62]:
type(my_dict)

dict

In [63]:
# We need to use the key of the element

my_dict['name']

'John'

In [64]:
# Get all the keys in a dictionary by using the keys method

my_dict.keys()

dict_keys(['name', 'age', 'city'])

In [65]:
# Get all the values in a dictionary by using the values method

my_dict.values()

dict_values(['John', 25, 'New York'])

In [66]:
# Get the size of a dictionary by using the len function

len(my_dict)

3

* You can create an empty dictionary by using curly braces `{}`. After creating an empty dictionary, you can add key-value pairs to it later by assigning values to specific keys. This is a common approach when you don't know the dictionary's content upfront, and you want to build it dynamically by adding elements later in your code.

In [67]:
my_dict = {}

my_dict

{}

In [68]:
my_dict['name'] = 'John'

my_dict['age'] = 25

my_dict

{'name': 'John', 'age': 25}

In [69]:
# Is the key "city" in this dictionary?

"city" in my_dict

False

In [70]:
"name" in my_dict

True

Use `my_dict.items()` to returns a view of all key-value pairs in the dictionary as tuples.

In [71]:
my_dict

{'name': 'John', 'age': 25}

In [72]:
my_dict.items()

dict_items([('name', 'John'), ('age', 25)])

### HW 2

1. What value will y have after the following lines of code are executed?

In [None]:
x = 2
y = x + 3
x = 3
y

2. Get the last word "great" from "Python is great!"? You can search how to split a string into a list of substrings based on spaces.

3. Create a dictionary that has the keys "E", "F", "G" with values "2", "3", "-1" individually. Print all the keys in the dictionary.

4. Create a Python list that contains four elements: `my_arr=[1,2,3,4]`. Reverse the order of the elements in the array using the slicing technique we discussed: `[start:end:step]`. Print both the original array and the reversed array.

5. Try to modify an element in a tuple and observe the error. Explain the error message. 

6. Get the unique element from the tuple (2, 3, 2, 3, 1, 2, 5) using an appropriate data structure.

7. Create two tuples, (1, 2, 3) and (4, 5, 6). Concatenate these tuples and print the result.

8. Given a Python list `[1, 2, 3, 4, 5]`, perform the following operations: (a) append the number 6 to the end of the list and (b) insert the number 0 at the beginning of the list.
Print the modified list.


9. Divide the given list into three equal parts, ensuring that your code accommodates lists of varying sizes.

In [None]:
sample_list = [11, 32, 8, 23, 14, 29, 78, 45, 89]


10. Find the intersection (common) of two sets and remove those elements from the first set. You can use the `intersection()` and `remove()` method of a set.

In [None]:
first_set = {23, 42, 65, 57, 78, 83, 29}

second_set = {57, 83, 29, 67, 73, 43, 48}