# Data Types

* 4 essential kinds of Python data with different powers and capabilities, like starter pack Pokémon    
    - Strings (Words)
    - Integers (Whole Numbers)
    - Floats (Decimal Numbers)
    - Booleans (True/False)

<img src="https://hips.hearstapps.com/digitalspyuk.cdnds.net/16/08/1456483171-pokemon2.jpg?resize=768:*">

**HEADS UP!**

🚨 To run any of the code on this page, you need to run this cell first!!🚨

In [None]:
import nltk
nltk.download('stopwords')

In [50]:
import re
from collections import Counter
from nltk.corpus import stopwords

def split_into_words(any_chunk_of_text):
    lowercase_text = any_chunk_of_text.lower()
    split_words = re.split("\W+", lowercase_text)
    return split_words

filepath_of_text = "../texts/music/Beyonce-Lemonade.txt"
nltk_stop_words = stopwords.words("english")
number_of_desired_words = 40

full_text = open(filepath_of_text, encoding="utf-8").read()

all_the_words = split_into_words(full_text)
meaningful_words = [word for word in all_the_words if word not in nltk_stop_words]
meaningful_words_tally = Counter(meaningful_words)
most_frequent_meaningful_words = meaningful_words_tally.most_common(number_of_desired_words)

print(most_frequent_meaningful_words)

[('love', 93), ('like', 50), ('slay', 49), ('sorry', 44), ('okay', 42), ('oh', 38), ('get', 32), ('daddy', 28), ('let', 28), ('back', 24), ('said', 22), ('work', 21), ('cause', 21), ('ft', 21), ('hold', 20), ('night', 19), ('feel', 19), ('hurt', 19), ('best', 19), ('winner', 19), ('every', 18), ('bout', 18), ('money', 17), ('baby', 16), ('boy', 16), ('long', 16), ('shoot', 16), ('good', 16), ('catch', 16), ('know', 15), ('ooh', 15), ('got', 14), ('come', 14), ('pray', 14), ('way', 13), ('gon', 13), ('kiss', 13), ('rub', 12), ('girl', 12), ('see', 12)]


You might be wondering why we put quotation marks around `"../texts/Beyonce-Lemonade.txt"` and not around `40`,  or why the file path shows up in red and 40 shows up in green.

In [13]:
filepath_of_text = "../texts/music/Beyonce-Lemonade.txt"
number_of_desired_words = 40

That's because these are two different "types" of Python data. The file path is what's called a "string," or words, and 40 is an "integer," or a whole number.

In Python, there are four basic data types:

- Strings (Words)
- Integers (Whole Numbers)
- Floats (Decimal Numbers)
- Booleans (True/False)

Each data type has different properties and different capabilities. You can check a data type if you use the function `type()`. As you may have noticed, functions use parentheses, and they do something to the thing inside the parentheses, what we call an "argument."


In [2]:
type(filepath_of_text)

str

In [15]:
type(number_of_desired_words)

int

Let's look at what happens if we change the data types of `filepath_of_text` and `number_of_desired_words`

In [235]:
import re
from collections import Counter
from nltk.corpus import stopwords

def split_into_words(any_chunk_of_text):
    lowercase_text = any_chunk_of_text.lower()
    split_words = re.split("\W+", lowercase_text)
    return split_words

filepath_of_text = ../texts/music/Beyonce-Lemonade.txt
nltk_stop_words = stopwords.words("english")
number_of_desired_words = 40

full_text = open(filepath_of_text, encoding="utf-8").read()

all_the_words = split_into_words(full_text)
meaningful_words = [word for word in all_the_words if word not in nltk_stop_words]
meaningful_words_tally = Counter(meaningful_words)
most_frequent_meaningful_words = meaningful_words_tally.most_common(number_of_desired_words)

print(most_frequent_meaningful_words)

SyntaxError: invalid syntax (<ipython-input-235-82fd123c8c66>, line 10)

In [240]:
import re
from collections import Counter
from nltk.corpus import stopwords

def split_into_words(any_chunk_of_text):
    lowercase_text = any_chunk_of_text.lower()
    split_words = re.split("\W+", lowercase_text)
    return split_words

filepath_of_text = "../texts/music/Beyonce-Lemonade.txt"
nltk_stop_words = stopwords.words("english")
number_of_desired_words = "40"

full_text = open(filepath_of_text, encoding="utf-8").read()

all_the_words = split_into_words(full_text)
meaningful_words = [word for word in all_the_words if word not in nltk_stop_words]
meaningful_words_tally = Counter(meaningful_words)
most_frequent_meaningful_words = meaningful_words_tally.most_common(number_of_desired_words)

print(most_frequent_meaningful_words)

TypeError: '>=' not supported between instances of 'str' and 'int'

# Strings (Words)

- Enclosed by either single or double quotation marks (doesn't matter which but you have to be consistent)
- Ability to combine strings with `+`
- Ability to manipulate in special ways (make lowercase or uppercase, replace parts, grab slices, etc.)

In [51]:
full_text



**Heads up!**
The `\n` above means "new line." 

In [19]:
type(full_text)

str

## Extract Parts of Strings

### Index

By using square brackets `[]`, you can "index"—or grab—part of a string based on its character number. The first line of Beyonce's album Lemonade is:
> Six inch heels, she walked in the club like nobody's business.

If we index the very first character of the album, which character do you think we'll get?

In [None]:
full_text[1]

In [None]:
full_text[0]

This is one weird thing about Python that you're just going to have to commit to memory. The zero-th place in a Python index is actually the very first place. Its number system starts with zero.

### Slice

By using a colon `:`, we can index a string up to a certain character.

In [67]:
full_text[:61]

"Six inch heels, she walked in the club like nobody's business"

In [52]:
lemonade_first_line = full_text[:61]

## Add Strings Together

In [266]:
lemonade_first_line + ", and then she took off her shoes because they were pretty tall"

"Six inch heels, she walked in the club like nobody's business, and then she took off her shoes because they were pretty tall"

## Lots of Handy String Methods!

For more, see [String Methods](https://melaniewalsh.github.io/Intro-Cultural-Analytics/Python/String-Methods.html)

In [68]:
lemonade_first_line.upper()

"SIX INCH HEELS, SHE WALKED IN THE CLUB LIKE NOBODY'S BUSINESS"

A method is like a function, except that it follows the thing it's going to act upon, and it doesn't necessarily need an "argument" in the parentheses.

## Put Variables Inside Strings

You can also insert variables into a string with something called **f-strings**! They're amazing. An f-string must begin with an `f` outside the quotation marks, and then the variable must be be placed within curly brackets `{}`, like so:

In [267]:
print(f"Beyonce stepped on stage and sang: \n\n'{lemonade_first_line}'")

Beyonce stepped on stage and sang: 

'Six inch heels, she walked in the club like nobody's business'


### Your Turn

Remix! Make a variable called `new_first_line` and assign it the value of `lemonade_first_line` plus `+` your new remixed ending `new_first_line`. Then print it.

In [None]:
new_first_line = #Your Code Here

In [None]:
print(f"Beyonce stepped on stage and sang: \n\n'{#Your Code Here}'")

# Integers & Floats (Numbers)

- You can do math with them!

In [268]:
number_of_desired_words = 40

In [269]:
type(number_of_desired_words)

int

In [270]:
number_of_desired_words + 57

97

In [22]:
new_number = 100005

In [23]:
number_of_desired_words * new_number

4000200

In [None]:
number_of_desired_words= 40.5

In [273]:
type(number_of_desired_words)

float

In [275]:
type(40.555555555555)

float

## Multiplication

In [26]:
4 * 2

8

## Exponents

In [46]:
4 ** 2

16

## Order of Operations

In [31]:
4 + 2 * 2

8

In [30]:
(4 + 2) * 2

12

# Booleans (True/False)

Booleans are like little judgments. They report on whether things in your Python universe are `True` or `False`.

In [10]:
beyonce = "Grammy award-winner"

In [6]:
beyonce == "Grammy award-winner"

True

In [7]:
beyonce == "Oscar award-winner"

False

In [11]:
beyonce == "Grammy award winner"

False

In [8]:
type(beyonce == "Oscar award-winner")

bool

# Type Conversion

In [43]:
number_of_desired_words = 40
number_of_desired_words

40

In [44]:
type(number_of_desired_words)

int

In [None]:
str(number_of_desired_words)

In [47]:
converted_num_desired_words = str(number_of_desired_words)

In [48]:
type(converted_num_desired_words)

str

In [55]:
int(lemonade_first_line)

ValueError: invalid literal for int() with base 10: "Six inch heels, she walked in the club like nobody's business"

# In-Class Exercise

In [60]:
name = 'Prof. Walsh' #string
age = 1000 #integer
place = 'Chicago' #string 
favorite_food = 'tacos' #string
dog_years_age = age * 7.5 #float
student = False #boolean

In [61]:
print(f'✨This is...{name}!✨')

print(f'{name} likes {favorite_food} and once lived in {place}. {name} is {age} years old, which is {dog_years_age} in dog years. The statement "{name} is a student" is {student}.')

✨This is...Prof. Walsh!✨
Prof. Walsh likes tacos and once lived in Chicago. Prof. Walsh is 1000 years old, which is 7500.0 in dog years. The statement "Prof. Walsh is a student" is False.


## Your turn!

Make sure `name`, `place`, and `favorite_food` are strings. Make sure `age` and `dog_years_age` integers and floats. Make sure `student` is a boolean.

In [None]:
name = #Your code here
age = #Your code here
home_town = #Your code here
favorite_food = #Your code here
dog_years_age =#Your code here * 7.5
student = False #boolean

In [None]:
print(f'✨This is...{name}!✨')

print(f'{name} likes {favorite_food} and once lived in {place}. {name} is {age} years old, which is {dog_years_age} in dog years. The statement "{name} is a student" is {student}.')

Now add a new variable called `favorite_movie` and update the f-string to include a new sentence about the person's favorite movie.

In [None]:
name = 
age = 
home_town = 
favorite_food = 
dog_years_age =
#favorite_movie = 

In [None]:
print(f'✨This is...{name}!✨')

print(f'{name} likes {favorite_food} and once lived in {place}. {name} is {age} years old, which is {dog_years_age} in dog years. The statement "{name} is a student" is {student}. # YOUR NEW SENTENCE HERE')