# Data Types

* 4 essential kinds of Python data with different powers and capabilities, like starter pack Pokémon    
    - Strings (Words)
    - Integers (Whole Numbers)
    - Floats (Decimal Numbers)
    - Booleans (True/False)

<img src="https://hips.hearstapps.com/digitalspyuk.cdnds.net/16/08/1456483171-pokemon2.jpg?resize=768:*">

**HEADS UP!**

🚨 To run any of the code on this page, you need to run this cell first!!🚨

In [None]:
import nltk
nltk.download('stopwords')

In [1]:
import re
from collections import Counter
from nltk.corpus import stopwords

def split_into_words(any_chunk_of_text):
    lowercase_text = any_chunk_of_text.lower()
    split_words = re.split("\W+", lowercase_text)
    return split_words

filepath_of_text = "../texts/music/Beyonce-Lemonade.txt"
nltk_stop_words = stopwords.words("english")
number_of_desired_words = 40

full_text = open(filepath_of_text, encoding="utf-8").read()

all_the_words = split_into_words(full_text)
meaningful_words = [word for word in all_the_words if word not in nltk_stop_words]
meaningful_words_tally = Counter(meaningful_words)
most_frequent_meaningful_words = meaningful_words_tally.most_common(number_of_desired_words)

print(most_frequent_meaningful_words)

[('love', 93), ('like', 50), ('slay', 49), ('sorry', 44), ('okay', 42), ('oh', 38), ('get', 32), ('daddy', 28), ('let', 28), ('back', 24), ('said', 22), ('work', 21), ('cause', 21), ('ft', 21), ('hold', 20), ('night', 19), ('feel', 19), ('hurt', 19), ('best', 19), ('winner', 19), ('every', 18), ('bout', 18), ('money', 17), ('baby', 16), ('boy', 16), ('long', 16), ('shoot', 16), ('good', 16), ('catch', 16), ('know', 15), ('ooh', 15), ('got', 14), ('come', 14), ('pray', 14), ('way', 13), ('gon', 13), ('kiss', 13), ('rub', 12), ('girl', 12), ('see', 12)]


You might be wondering why we put quotation marks around `"../texts/Beyonce-Lemonade.txt"` and not around `40`,  or why the file path shows up in red and 40 shows up in green.

In [13]:
filepath_of_text = "../texts/music/Beyonce-Lemonade.txt"
number_of_desired_words = 40

That's because these are two different "types" of Python data. The file path is what's called a "string," or words, and 40 is an "integer," or a whole number.

In Python, there are four basic data types:

    - Strings (Words)
    - Integers (Whole Numbers)
    - Floats (Decimal Numbers)
    - Booleans (True/False)

Each data type has different properties and different capabilities. You can check a data type if you use the function `type()`


In [14]:
type(filepath_of_text, encoding="utf-8")

str

In [15]:
type(number_of_desired_words)

int

Let's look at what happens if we change the data types of `filepath_of_text` and `number_of_desired_words`

In [235]:
import re
from collections import Counter
from nltk.corpus import stopwords

def split_into_words(any_chunk_of_text):
    lowercase_text = any_chunk_of_text.lower()
    split_words = re.split("\W+", lowercase_text)
    return split_words

filepath_of_text = ../texts/music/Beyonce-Lemonade.txt
nltk_stop_words = stopwords.words("english")
number_of_desired_words = 40

full_text = open(filepath_of_text, encoding="utf-8").read()

all_the_words = split_into_words(full_text)
meaningful_words = [word for word in all_the_words if word not in nltk_stop_words]
meaningful_words_tally = Counter(meaningful_words)
most_frequent_meaningful_words = meaningful_words_tally.most_common(number_of_desired_words)

print(most_frequent_meaningful_words)

SyntaxError: invalid syntax (<ipython-input-235-82fd123c8c66>, line 10)

In [240]:
import re
from collections import Counter
from nltk.corpus import stopwords

def split_into_words(any_chunk_of_text):
    lowercase_text = any_chunk_of_text.lower()
    split_words = re.split("\W+", lowercase_text)
    return split_words

filepath_of_text = "../texts/music/Beyonce-Lemonade.txt"
nltk_stop_words = stopwords.words("english")
number_of_desired_words = "40"

full_text = open(filepath_of_text, encoding="utf-8").read()

all_the_words = split_into_words(full_text)
meaningful_words = [word for word in all_the_words if word not in nltk_stop_words]
meaningful_words_tally = Counter(meaningful_words)
most_frequent_meaningful_words = meaningful_words_tally.most_common(number_of_desired_words)

print(most_frequent_meaningful_words)

TypeError: '>=' not supported between instances of 'str' and 'int'

# Strings (Words)

- Enclosed by either single or double quotation marks (doesn't matter which but you have to be consistent)
- Ability to combine strings with `+`
- Ability to manipulate in special ways (make lowercase or uppercase, replace parts, grab slices, etc.)

In [21]:
print(full_text)

Six inch heels, she walked in the club like nobody's business
Goddamn, she murdered everybody and I was her witness

She's stacking money, money everywhere she goes
You know, pesos out of Mexico
De uno, commas and them decimals
She don't gotta give it up, she professional
She mixing up that Ace with that Hennessy
She love the way it tastes, that's her recipe
Rushing through her veins like it's ecstasy (Oh no)
She already made enough, but she'll never leave

Six inch heels, she walked in the club like nobody's business
Goddamn, she murdered everybody and I was her witness
She works for the money, she work for the money
From the start to the finish
And she worth every dollar, she worth every dollar
And she worth every minute

She work for the money
She work for the money
She work for the money
She work for the money

She stack her money, money everywhere she goes
She got that Sake, her Yamazaki straight from Tokyo
Oh baby, you know, she got them commas and them decimals
She don't gotta g

In [19]:
type(full_text)

str

## Extract Parts of Strings

By using square brackets `[]`, you can "index"—or grab—part of a string based on its character number. The first line of Beyonce's album Lemonade is "Six inch heels, she walked in the club like nobody's business." If we index the first character, what do you think we'll get?

In [24]:
full_text[1]

'i'

In [25]:
full_text[0]

'S'

This is one weird thing about Python that you're just going to have to commit to memory. The zero-th place in a Python index is actually the very first place. Its number system starts with zero.

By using a colon `:`, we can index a string up to a certain character.

In [27]:
full_text[:61]

"Six inch heels, she walked in the club like nobody's business"

In [263]:
lemonade_first_line = full_text[:61]

## Add Strings Together

In [266]:
lemonade_first_line + ", and then she took off her shoes because they were pretty tall"

"Six inch heels, she walked in the club like nobody's business, and then she took off her shoes because they were pretty tall"

In [293]:
lemonade_first_line.upper()

"SIX INCH HEELS, SHE WALKED IN THE CLUB LIKE NOBODY'S BUSINESS"

## Put Variables Inside Strings

You can also insert variables into a string with something called **f-Strings**! They're amazing. An f-string must begin with an `f` outside the quotation marks, and then the variable must be be placed within curly brackets `{}`, like so:

In [267]:
print(f"Beyonce stepped on stage and sang: \n\n'{lemonade_first_line}'")

Beyonce stepped on stage and sang: 

'Six inch heels, she walked in the club like nobody's business'


### Your Turn

Remix! Make a variable called `new_first_line` and assign it the value of:

`lemonade_first_line` plus (`+`) your new remixed ending

Then print it.

In [None]:
new_first_line = #Your Code Here

In [None]:
print(f"Beyonce stepped on stage and sang: \n\n'{#Your Code Here}'")

# Integers & Floats (Numbers)

- You can do math with them!

In [268]:
number_of_desired_words = 40

In [269]:
type(number_of_desired_words)

int

In [270]:
number_of_desired_words + 57

97

In [272]:
number_of_desired_words = 40.5

In [273]:
type(number_of_desired_words)

float

In [274]:
number_of_desired_words + 57

97.5

In [275]:
type(40.555555555555)

float

# Booleans (True/False)

Booleans are like little judgments. They report on whether things in your Python universe are `True` or `False`.

In [10]:
beyonce = "Grammy award-winner"

In [6]:
beyonce == "Grammy award-winner"

True

In [7]:
beyonce == "Oscar award-winner"

False

In [11]:
beyonce == "Grammy award winner"

False

In [8]:
type(beyonce == "Oscar award-winner")

bool

# In-Class Exercise

In [29]:
name = 'Melanie'
age = 1000
home_town = 'Chicago'
favorite_food = 'tacos'
dog_years_age = age * 7

In [30]:
print(f'Introducing.....{name}!')

print(f'{name} likes {favorite_food} and once lived in {home_town}. Additionally {name} is {age} years old ({dog_years_age} in dog years).')

Introducing.....Melanie!
Melanie likes tacos and once lived in Chicago. Additionally Melanie is 1000 years old (7000 in dog years).


Your turn!

In [None]:
name = 
age = 
home_town = 
favorite_food = 
dog_years_age =

In [None]:
print(f'Introducing.....{name}!')

print(f'{name} likes {favorite_food} and once lived in {home_town}. Additionally {name} is {age} years old ({dog_years_age} in dog years).')