## Intro to Python and Pandas

Python is:
- Object-oriented
- Dynamically typed
- White space dependent

Fun fact: Python is named after Monty Python and not the reptile
***
The latest Python version is **3.10.4**

Python is available to download at: [Download](https://www.python.org/downloads/)  
Documentation is available here: [Documentation](https://docs.python.org/3/)

We will mostly use Jupyter notebooks throughout this course.
You may use whichever IDE you prefer. I suggest
- [PyCharm](https://www.jetbrains.com/pycharm/) by JetBrains. *Note*: You need a Pro license for Jupyter notebooks, available for free with an edu email address
- [VSCode](https://code.visualstudio.com/) by Microsoft
- [Anaconda](https://www.anaconda.com/products/individual) by Anaconda
***
## Topics:
1. Variables and basic operations
2. More data structures  
    2.1 Lists  
    2.2 Tuples  
    2.3 Dictionaries  
    2.4 Sets  
2. Control flow statements  
    3.1 If, elif, and else statements  
    3.2 For and while loops  
    3.3 Error handling
4. Defining functions
5. Defining object classes
6. User input and file reading/writing

### Variables and Data Types

In [None]:
my_string = 'hello world'
my_int = 2021
my_float = 101.234
my_bool = True
my_null = None

In [None]:
isinstance(my_string, int)

In [None]:
type(my_string)

In [None]:
type(my_bool)

In [None]:
type(my_null)

In [None]:
my_string

In [None]:
print(my_string)

In [None]:
my_string = 5

In [None]:
type(my_string)

In [None]:
casted = str(my_string)

In [None]:
casted

In [None]:
type(casted)

In [None]:
casted = float(my_int)
casted

In [None]:
int(9.000001), int(9.99999)

### Truthy and Falsy Values

In [None]:
bool(0), bool(0.0), bool(''), bool([]), bool({}), bool(None)

In [None]:
bool(1), bool(-1.5), bool('False'), bool([[]]), bool({0}), bool([None])

In [None]:
my_int = []

if my_int:
    print('here')
else:
    print('not here')

### Arithmetic Operations

In [None]:
print(5 + 3)  # Addition
print(5 - 3)  # Subtraction
print(5 * 3)  # Multiplication
print(5 / 3)  # Floating point division
print(5 // 3) # Floor (integer) division
print(5 ** 3) # Exponentiation
print(5 % 3)  # Modulus (remainder)

In [None]:
# Python does not have increment/decrement operators
i = 1

# i++
# i--

# Instead python has
i += 1

In [None]:
# THe other arithmetic operators can be used similarly
i /= 2
i

In [None]:
i *=5
i

### String Methods

In [None]:
my_str1 = 'University of North Carolina at'
my_str2 = 'Charlotte'

In [None]:
my_str1 + my_str2

In [None]:
out_str = my_str1 + ' ' + my_str2

In [None]:
out_str

In [None]:
my_float = round(my_float, 1)
my_float

In [None]:
f'${round(my_float, 2)}'

In [None]:
out_str2 = f'{my_str1}\n{my_str2}'
print(out_str2)

In [None]:
len(out_str)

In [None]:
equal_len = len(out_str) == len(my_str1) + len(my_str2) + 1
equal_len

In [None]:
out_str.upper()

In [None]:
out_str

In [None]:
out_str.capitalize()

In [None]:
out_str.startswith('Uni')

In [None]:
out_str.split()

In [None]:
out_str.split('o')

In [None]:
out_str += '       '

out_str

In [None]:
out_str = out_str.strip()

In [None]:
'Char' in out_str

In [None]:
'char' in out_str.lower()

### Indexing and Slicing

String are collections. You can index individual characters and use slicing to access substrings

In [None]:
first_ltr = out_str[0]

In [None]:
first_ltr

In [None]:
type(first_ltr)

In [None]:
out_str[0:5]

In [None]:
out_str[:5]

In [None]:
out_str[5:len(out_str)]

In [None]:
out_str[5:]

In [None]:
out_str[-1]

In [None]:
out_str[:-3]

In [None]:
out_str[-3:]

In [None]:
out_str[2:10:2]

In [None]:
out_str[::-1]

In [None]:
out_str

In [None]:
my_list = [1, 2, 3]
my_list_copy = my_list[:]
my_list_2 = my_list

In [None]:
print(my_list)
print(my_list_copy)

In [None]:
my_list is my_list_copy

In [None]:
my_list is my_list_2

In [None]:
my_list == my_list_copy

In [None]:
my_list_2[1] = -1

In [None]:
my_list

### Command Line Input

In [None]:
user_in = input('Enter something: ')

In [None]:
user_in

In [None]:
type(user_in)

### Importing modules/libraries

In [None]:
import math
import re
import statistics as stats
from random import shuffle, randint, choices

In [None]:
my_int = randint(0, 3)  # Unlike most other operations, randint is *inclusive* of the right value
my_int

In [None]:
out_str

In [None]:
result = re.search('char', out_str.lower())
result

In [None]:
rand_nums = choices([1, 2, 3, 4, 5], k=5)
rand_nums

In [None]:
stats.median(rand_nums)

## Practice Problems

Prompt the user to enter a number. Store the input as an integer variable.
1. Print the variable value and its type.
2. Print the cube of the number.
3. Store in a boolean variable whether the number is even or odd.

Using the following string
1. Capitalize the string and add a period to the end. Store the result in the original variable.
2. Print the length of the string, the first 5 characters, and the last 4 characters.
3. Print whether the string contains 'cat' and whether it ends with 'dog'.
4. Use slicing to create a copy of the string in reverse order.
5. Split the string on whitespace. How many words are in this sentence?

In [None]:
sentence = 'the quick brown fox jumps over the lazy dog'

Swap the two variables without using a third temporary variable

In [None]:
a = 5
b = 'test'

In [None]:
a, b = b, a

## Control flow

### Boolean logic

In [None]:
tmp = 'hello'

In [None]:
tmp == 'goodbye'

In [None]:
tmp != 'goodbye'

In [None]:
'll' in tmp

In [None]:
my_nums = [1, 2, 3.5]
1 in my_nums

In [None]:
'z' not in tmp

In [None]:
'z' not in tmp and 'a' in tmp

In [None]:
'z' not in tmp or 'a' in tmp

In [None]:
test_num = 7

if 5 < test_num < 10:
    print('between')

if test_num < 5:
    print('less than 5')
elif test_num < 10:
    print('between 5 and 10')
elif test_num <= 100:
    print('between 10 and 100 (inclusive)')
else:
    print('Big number (or negative)')

### Loops

In [None]:
my_data = list(range(10, 25))

my_data

In [None]:
for num in my_data:
    print(num)
    
    if num % 2 == 0:
        break

In [None]:
for num in my_data:
    if num % 2 == 0:
        continue
    print(num)

In [None]:
i = 0
while i < 5:
    print(i)
    i += 1

## Error Handling

In [None]:
my_dict = {'key1': 'val1', 2: 3.5}

try:
    tmp = my_dict['foobar']
except KeyError as e:
    print(f'Key not in the dictionary!. {e}')
except NameError:
    print('dict does not exist')
finally:
    print('Done!')


## More Data Structures: Collections

### Lists

*Ordered* and *mutable* and *indexed* and *heterogeneous*

In [None]:
fruits = ['apple', 'banana', 'orange', 'peach', 'watermelon']

In [None]:
fruits

In [None]:
len(fruits)

In [None]:
fruits[0]

In [None]:
fruits[::2]

In [None]:
fruits[1][0]

## Iterating over lists

In [None]:
for fruit in fruits:
    print(fruit)

In [None]:
for i, fruit in enumerate(fruits):
    print(i, fruit)

In [None]:
for i in range(len(fruits)):
    print(i, fruits[i])

In [None]:
for fruit in fruits:
    for ltr in fruit:
        print(ltr, end=' ')
    print()

In [None]:
nums = [5, 4, 3, 2, 1]

for n, fruit in zip(nums, fruits):
    print(n, fruit)

***

### List Functions and Methods

In [None]:
range(5)

In [None]:
nums = list(range(5))
nums

In [None]:
list(range(2, 20, 3))

In [None]:
shuffle(nums)
nums

In [None]:
# Will this work?
shuffle(out_str)

In [None]:
rev_nums = reversed(nums)

In [None]:
rev_nums

In [None]:
print(f'Orig nums: {nums}')
print(f'Rev nums: {list(rev_nums)}')

In [None]:
nums.reverse()

In [None]:
print(f'Nums: {nums}')

In [None]:
sorted_nums = sorted(nums, reverse=True)

In [None]:
print(f'Orig nums: {nums}')
print(f'Sorted nums: {sorted_nums}')

In [None]:
nums.sort(reverse=True)

In [None]:
print(f'Nums: {nums}')

In [None]:
nums.append(15)

In [None]:
nums.insert(1, 25)

In [None]:
nums

In [None]:
del nums[0]

In [None]:
nums

In [None]:
val = nums.pop(0)

In [None]:
print(val)
print(nums)

In [None]:
sum(nums)

In [None]:
max(nums)

In [None]:
min(nums)

## Tuples

*Ordered* and *immutable* and *indexed*

In [None]:
my_tuple = (2, 'fifteen', True)

In [None]:
my_tuple

In [None]:
my_tuple[0]

In [None]:
my_tuple[4] = 5

## Sets

*Unordered* and *mutable* and *unindexed*

**Frozensets** are the immutable version of sets

In [None]:
my_set = {1, 2, 4, 4, 4, 4, 6, 7, 3}
my_set

In [None]:
my_list = [1, 1, 1, 2, 2, 2, 3, 3, 3]

list(set(my_list))

In [None]:
my_set.add(15)
my_set.remove(4)

In [None]:
my_set

In [None]:
my_set2 = set(range(5, 20, 2))
my_set2

In [None]:
print(my_set.intersection(my_set2))
print(my_set & my_set2)

In [None]:
print(my_set.union(my_set2))
print(my_set | my_set2)

In [None]:
my_set

## Dictionaries

*Unordered* and *mutable* and *unindexed*

In [None]:
my_dict = {
    'name': 'Josh Melton',
    'school': 'UNC Charlotte',
    'year': 2022,
    'is_student': True,
    (0, 0): {
        1: 'Test',
        2: 2
    },
    frozenset({1, 2}): 'test'
}

my_dict

### Accessing elements by keys

In [None]:
my_dict['name']

In [None]:
my_dict.get('name')

In [None]:
my_dict['foobar']

In [None]:
my_dict.get('foobar', -1)

In [None]:
test_key = (0, 0)

key_exists = test_key in my_dict

In [None]:
key_exists

In [None]:
my_dict['new_key'] = [1, True, -3.5, 'aabba']
my_dict

In [None]:
my_dict['new_key'] = None

In [None]:
print(len(my_dict))
my_dict

In [None]:
del my_dict['new_key']

In [None]:
print(len(my_dict))
my_dict

### Iterating over dictionaries

In [None]:
for k in my_dict:
    print(k)

In [None]:
for k in my_dict.keys():
    print(k)

In [None]:
for v in my_dict.values():
    print(v)

In [None]:
for k, v in my_dict.items():
    print(f'Key: {k}\tValue: {v}')

In [None]:
list(my_dict.items())

## Practice Problems

Prompt the user to input an integer and store it in a variable. Ensure that the user input is a valid integer. If it is not, use a loop to re-prompt the user for input until a valid number is entered.

Using the list of integers below, reverse the list using **three** different means. Which of these alter the variable and which return a new object?

In [None]:
nums = list(range(10))
shuffle(nums)
nums

The data below contain the revenues for a restaurant for the last four months.

Revenues:  
- December: $300,000
- November: $285,000
- October: $325,000
- September: $318,000

Costs:  
- December: $100,000
- November: $125,000
- October: $150,000
- September: $200,000

1. Create a list containing the revenue data and a list containing the cost data.
2. What was the average revenue of our restaurant over the four month period?
3. What was the total net profit for our restaurant?
4. Compute the profit for the restaurant for each month.

Instead of lists, use **one nested dictionary** to store our data. The outer key should be the month.

Compute monthly profit and store it as a new element in each month's entry.

There was an accounting error for the month of November. Update the revenue for November to be $290,000. Be sure to recompute the profit number for that month.

## List and Dictionary Comprehensions

In [None]:
nums = list(range(5))
nums_squared = [n**2 for n in nums]

In [None]:
nums_squared

Advanced comprehensions with if/else conditions

In [None]:
filtered_squares = [
    square
    for square in nums_squared
    if square % 2 == 0
]

In [None]:
filtered_squares

In [None]:
fancy_list = [
    ('even', square) if i % 2 == 0
    else ('odd', -square)
    for i, square in enumerate(nums_squared)
]

fancy_list

In [None]:
squares_dict = {n: n**2 for n in nums}
squares_dict

In [None]:
fancy_dict = {
    n: n**2 if i % 2 == 0 else -n
    for i, n in enumerate(nums)
}
fancy_dict

## Defining functions

In [None]:
def my_func(a, b, c=1, do_add=True):
    if do_add:
        return a + b + c
    else:
        return a - b - c

In [None]:
num1 = 2
num2 = 3
num3 = 5
do_add = False

In [None]:
my_func(num1, num2)

In [None]:
my_func(num1, num2, do_add=do_add)

In [None]:
def placeholder():
    pass

In [None]:
fancy_dict.items()

In [None]:
sorted(fancy_dict.items(), key=lambda x: x[1])

# Defining object classes

In [None]:
class Dataset:
    def __init__(self, data, labels=None):
        self.data = data
        self.labels = labels
        
        self._private = 10
    
    def __len__(self):
        return len(self.labels)
    
    def get_private(self):
        return self._private
    
    def set_private(self, val):
        self._private = val
    
    def __repr__(self):
        return f'Data = {data}\nLabels = {labels}'
    
    def multiply_data(self, num=1):
        return [
            [el * num for el in row]
            for row in self.data
        ]

In [None]:
data = [
    [0, 1, 0, 1],
    [1, 0, 0, 1],
    [0, 1, 1, 0],
    [1, 1, 0, 0]
]
labels = [0, 0, 0, 1]

In [None]:
my_dataset = Dataset(data, labels)
print(my_dataset)

In [None]:
len(my_dataset)

In [None]:
my_dataset.multiply_data(2)

In [None]:
my_dataset.data

## Reading and writing files

In [None]:
with open('dummy_txt.txt') as f:
#     text = f.read()
#     text = f.readlines()
    for line in f:
        print(repr(line))

text

In [None]:
with open('dummy_txt.txt') as f:
    text = [line.strip() for line in f]

text

In [None]:
with open('dummy_txt.txt') as f:
    out_text = []
    for line in f:
        # do something with raw line input
        out_text.append(line)

In [None]:
sentence = 'This year is the year of the tiger'

In [None]:
with open('out.txt', 'w') as f_out:
    for word in sentence:
        f_out.write(f'{word}\n')

In [None]:
with open('out.txt', 'w') as f:
    f.write(
        '\n'.join(sentence.split())
    )

JSON files

In [None]:
import json

with open('dummy_json.json') as f:
    data_dict = json.load(f)

data_dict

In [None]:
with open('out.json', 'w') as f:
    json.dump(f, data_dict)

## Practice Problems

Use a list comprehension to create a list of all the numbers that are divisible by 11.

In [None]:
nums = list(range(100))

Use a list comprehension to count the number of vowels in this sentence.

In [None]:
sentence = 'The quick brown fox jumps over the lazy dog'

Use a comprehension expression to create a dictionary with keys containing each word in the sentence and each value is the length of the word.

Using your dictionary, print each word and its length in order from shortest word to longest.

Define a function that takes an input array of numbers and returns the result that is computed by alternating addition with subtraction on the numbers in the array.

For example: if x = [1, 2, 3, 4. 5], then the function should return 3 as 3 = 1 - 2 + 3 - 4 + 5.

Consider the Python function below and the example of the input and output of this function. Describe in your own words what each step of the *process_tweets* function is doing.

In [None]:
def process_tweets(tweet_json, language='en'):
    post_dict = {}
    for tweet_id, tweet_data in tweet_json.items():
        try:
            if tweet_data['language'] == language:
                text = tweet_data['body'].lower().strip()
                
                post_dict[tweet_id] = {
                    'raw_text': tweet_data['body'],
                    'text': text,
                    'num_tokens': len(text.split())
                }
        except KeyError:
            continue
    
    return post_dict

In [None]:
tweets = {
    123456: {'language': 'en', 'user': 'user123', 'body': 'Hello friends'},
    789012: {'user': 'asdf', 'language': 'es', 'body': 'Hola Amigos'},
    543876: {'user': '__abcdef__', 'language': 'en', 'body': 'ToTaLLy nOt a BoT aCCouNt.'},
    623432: {'user': None}
}

In [None]:
en_posts = process_tweets(tweets)
en_posts

In [None]:
es_posts = process_tweets(tweets, language='es')
es_posts