# Growing from Base Python to Pandas

## Comparison Operators
- Return a True or False value
- `==, <, <=, >=, >` 
- Our operators matter because they can answer question for us instead of us manually doing things

In [1]:
# Single equal sign assigns a value to a variable
greeting = "Hello"

# Double equal sign returns true/false when comparing values on the left to values on the right
greeting == "hello"

False

In [2]:
# We need to operate on variables without manually checking them
len(greeting) > 5

False

In [3]:
len(greeting) >= 5

True

## Other operators and functions can give us Trues and Falses
- Why are booleans valuable? Because we can use the yes/no answer to make decisions
- Specifically, we make sure the CODE makes decisions, not people.

In [4]:
# The in operator gives us a true or a false
"a" in "good morning"

False

In [5]:
# We have lots of functions that give us true/false answers
greeting.isnumeric()

False

In [6]:
# But here, the string containing the numeral 4 is numeric
"4".isnumeric()

True

In [7]:
greeting.endswith("Z")

False

In [8]:
greeting.startswith("H")

True

## Punchline: operators and functions give us booleans.
- So what?
- We can use booleans to make decisions with conditionals like an if or if/else
- We can also use booleans to filter our data.
- And filtering data we want vs. data we don't want is critical
- We may want to tighten our filters with ANDs
    - Allergic to peanuts and shellfish and dairy and eggs mean we have more limited options
    - And is about filtering things down more and more
- We may want to expand our filters with ORs
    - I'm good with pizza or pasta or salads or curry
    - ORs expand our options, adding more options

In [9]:
# Does this string start with "H" and end with "o"
greeting.startswith("H") and greeting.endswith("o")

True

In [10]:
# We only need a single True in a collection of ORs to "Truthify" the whole expression
"howdy".startswith("h") or "howdy".endswith("z")

True

In [11]:
# Does this string start with "H" and end with "o" and the string is longer than five characters
greeting.startswith("H") and greeting.endswith("o") and len(greeting) > 5

False

## So what does this have to with Pandas?
- What does this have to do with operating on entire series or arrays in pandas or numpy?
- The "big deal" is that we can use operators and functions on an entire collection of values instead of one single value
- If you can operate on an entire collection with the same/similar effort, you have a force multiplier

In [12]:
import pandas as pd

In [13]:
# All a Series is a fancy, more powerful list.
cars = pd.Series(["Batmobile", "honda", "yugo", "ford", "toyota", "hyundai", "volkswagen"])
cars

0     Batmobile
1         honda
2          yugo
3          ford
4        toyota
5       hyundai
6    volkswagen
dtype: object

In [14]:
# If we only wanted to check/filter a series to only return the "ford"
# The True is the last item on the output, b/c "ford" was the last on the input
cars == "ford"

0    False
1    False
2    False
3     True
4    False
5    False
6    False
dtype: bool

## An array/Series of booleans is the key to the castle
- We can use our boolean arrays to filter our data
- If we use the boolean array in square brackets next to the original array,
- We filter the resuls

In [15]:
# It's a good idea to save your boolean arrays to their own variable
is_ford = cars == "ford"
is_ford

0    False
1    False
2    False
3     True
4    False
5    False
6    False
dtype: bool

In [16]:
# Hey cars series, only give me the values from is_ford that were true
# Bracket syntax looks/feels like list index syntax
cars[is_ford]

3    ford
dtype: object

In [17]:
# We don't HAVE to make a variable for the boolean array, 
# We can put that boolean expression in the square brackets directly
cars[cars == "ford"]

3    ford
dtype: object

## Nice Things:
- Pandas uses the comparison operators directly, no other functionality or requirements

In [18]:
# How can we find the cars that end with the letter "a"
cars

0     Batmobile
1         honda
2          yugo
3          ford
4        toyota
5       hyundai
6    volkswagen
dtype: object

In [19]:
# .startswith and ALL the other string methods run on one string
"honda".startswith("h")

True

In [20]:
# But, they don't work directly on a series....
# We have a little problem now.
# This will throw an error
# cars.startswith("y")

In [21]:
# Here's what the pandas library programmers did:
# Introduced the .str. component
# Notice, that the original function name is the same (and many pandas are same/similar)
# Our price of admission: series.str.string_method()
cars.str.startswith("h")

0    False
1     True
2    False
3    False
4    False
5     True
6    False
dtype: bool

In [22]:
cars.str.startswith("Bat")

0     True
1    False
2    False
3    False
4    False
5    False
6    False
dtype: bool

In [23]:
# .count counts a single character of a string
"banana".count("a")

3

In [24]:
# Count the letter "a" in all the cars
# series.str.count("a") counts all the "a" characters
cars.str.count("a")

0    1
1    1
2    0
3    0
4    1
5    1
6    1
dtype: int64

In [25]:
# We can expand this a bit
# Count the "a" and "e" characters
cars.str.count("a") + cars.str.count("e")

0    2
1    1
2    0
3    0
4    1
5    1
6    2
dtype: int64

In [26]:
# Count the vowels for each car
cars.str.count("a") + cars.str.count("e") + cars.str.count("i") + cars.str.count("o") + cars.str.count("u")

0    4
1    2
2    2
3    1
4    3
5    3
6    3
dtype: int64

In [27]:
cars

0     Batmobile
1         honda
2          yugo
3          ford
4        toyota
5       hyundai
6    volkswagen
dtype: object

In [28]:
# We can use these numeric outputs together with boolean operators like ==, >, >=, etc..
# Only show the cars that have one or more "a" character
# Step 1: count the letter "a"
cars.str.count("a")

0    1
1    1
2    0
3    0
4    1
5    1
6    1
dtype: int64

In [29]:
# Close, but not quite there...
# cars[cars.str.count("a")]

In [30]:
# Since you needed python to make a yes/no choice, we NEEDED an operator/function
# that would give us back Trues or Falses
cars.str.count("a") >= 1

0     True
1     True
2    False
3    False
4     True
5     True
6     True
dtype: bool

In [31]:
cars

0     Batmobile
1         honda
2          yugo
3          ford
4        toyota
5       hyundai
6    volkswagen
dtype: object

In [32]:
cars.str.count("a") >= 1

0     True
1     True
2    False
3    False
4     True
5     True
6     True
dtype: bool

In [33]:
# This boolean comparison only returned the "ford"
# select * from cars where car is "ford"
cars[cars == "ford"]

3    ford
dtype: object

In [34]:
cars[cars.str.count("a") >= 1]

0     Batmobile
1         honda
4        toyota
5       hyundai
6    volkswagen
dtype: object

In [35]:
# .str.len() returns the length of each string
cars.str.len()

0     9
1     5
2     4
3     4
4     6
5     7
6    10
dtype: int64

In [36]:
fruits = pd.Series(["kiwi", "mango", "strawberry", "pineapple", "gala apple", "honeycrisp apple", "tomato", "watermelon", "honeydew", "kiwi", "kiwi", "kiwi", "mango", "blueberry", "blackberry", "gooseberry", "papaya"])
fruits

0                 kiwi
1                mango
2           strawberry
3            pineapple
4           gala apple
5     honeycrisp apple
6               tomato
7           watermelon
8             honeydew
9                 kiwi
10                kiwi
11                kiwi
12               mango
13           blueberry
14          blackberry
15          gooseberry
16              papaya
dtype: object

In [37]:
# Write the code to get the string values with 5 or more letters in the name.
# Think of this like SQL
# SELECT * from fruits where len(fruit) >= 5
fruits[fruits.str.len() >= 5]

1                mango
2           strawberry
3            pineapple
4           gala apple
5     honeycrisp apple
6               tomato
7           watermelon
8             honeydew
12               mango
13           blueberry
14          blackberry
15          gooseberry
16              papaya
dtype: object

In [38]:
len("banana")

6

In [39]:
"banana".count("a")

3

In [40]:
# find the fruit(s) containing the letter "o" two or more times.
# think of this like: select * from fruits where the count of "o" characters is >= 2
fruits[fruits.str.count("o") >= 2]

6         tomato
15    gooseberry
dtype: object

## Takeaways
- Break each problem down into its smallest components 
- Build up tiny solutions to each tiny problem together
- Stitch together the tiny solutions with Python syntax
- If you can produce a series of booleans, you can filter your original series
