# Intro to Pandas
by Ryan Orsinger

## Module 1: Intro to pandas series

### Pandas Series Part 3: Strings
- Sorting values
- Using pandas built-in string methods
- Assigning and reassigining results
- Using string methods for data cleaning
- Updating data types

In [None]:
import pandas as pd

In [None]:
fruits = pd.Series(["apple", "orange", "banana", "lemon", "lime", "pineapple", "blueberry", "raspberry", "cranberry"])
fruits

In [None]:
# .sort_values sorts strings alphabetically or numbers in numerical order
# fruits.sort_values(ascending=True) the default sort order
fruits.sort_values()

In [None]:
fruits.sort_values(ascending=False)

In [None]:
# Use inplace=True to operate on the original
# For more on .sort_values, see https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html
fruits = fruits.sort_values()
fruits

In [None]:
# We can reassign the series to hold the sorted values
# For more on .sort_values, see https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html
fruits = fruits.sort_values(ignore_index=True)
fruits

In [None]:
# .capitalize to capitalize
fruits.str.capitalize()

In [None]:
fruits

In [None]:
# String operations keep the original series intact, so we reassign to update
capitalized_fruits = fruits.str.capitalize()
capitalized_fruits

In [None]:
# .contains returns a boolean series
# Always remember to use .str or your results will be in error
fruits.str.contains("apple")

In [None]:
# Since .contains returns a Boolean series, we can use it to filter our results
fruits[fruits.str.contains("apple")]

In [None]:
# .count to count substring occurrences
fruits.str.count("a")

In [None]:
fruits.str.count("berry")

In [None]:
# Summing up the results of .count
vowel_counts = fruits.str.count("a") + fruits.str.count("e") + fruits.str.count("i") + fruits.str.count("o") + fruits.str.count("u")
vowel_counts

In [None]:
# Using count with a Regular Expression character class
# Some of the Pandas string methods can utilize regular expressions
fruits.str.count("[aeiou]")

In [None]:
# We can use our new vowel count to filter values from the series
fruits[fruits.str.count("[aeiou]") > 2]

In [None]:
# .startswith returns a Boolean series
fruits.str.startswith("l")

In [None]:
fruits[fruits.str.startswith("l")]

In [None]:
# .endswith returns a Boolean series
fruits.str.endswith("berry")

In [None]:
# .len to get the length of the string
fruits.str.len()

In [None]:
# .lower to lowercase strings
shouts = pd.Series(["PLEASE", "LOWERCASE", "THESE", "STRINGS"])
not_shouts = shouts.str.lower()
not_shouts

In [None]:
# Using .replace to replace characters (also used to remove characters)
prices = pd.Series(["€5.99", "€12.25", "€95"])

# Be sure to reassign the variable
prices = prices.str.replace("€", "")

# But our data type is still a string
prices * 2

In [None]:
# Use .astype to convert a number in a string to a numeric data type
prices = prices.astype(float)
prices * 2

In [None]:
# .upper to uppercase
fruits.str.upper()

## Further Reading
- [Pandas user guide for text](https://pandas.pydata.org/docs/user_guide/text.html)
- [Pandas user guide](https://pandas.pydata.org/docs/user_guide/basics.html)

## Exercises
- Create a series named `vegetables` using the list of strings `["Onion", "cucumber", "Carrot", "squash", "Potato", "Asperagus", "kale", "Broccoli", "spinach"]`
- Write the code necessary to lowercase all of the vegetables and reassign your series.
- Write the pandas code to sort the strings in alphabetical order. Ensure that the series stores the sorted order
- Write the pandas code to show only the vegetables that start with a vowel.
- Write the pandas code to show the vegetables that have exactly two vowels
<br><br>
- Now make a new series named `prices` that holds `["$2.99", "$1,200.25", "$5.99", "$2,350.00"]`
- Reassign `prices` to hold only a string of numbers. Remove the `$` and `,` characters.
- Reassign `prices` to be a float data type
- Now multiply your `prices` series by `0.9`

In [None]:
# Create a series of vegetables ["Onion", "cucumber", "Carrot", "squash", "Potato", "Asperagus", "kale", "Broccoli", "spinach"]


In [None]:
# Write the code necessary to lowercase all of the vegetables and reassign your series.


In [None]:
# Write the pandas code to sort the strings in alphabetical order. Ensure that the series stores the sorted order


In [None]:
# Write the pandas code to show only the vegetables that start with a vowel


In [None]:
# Write the pandas code to show the vegetables that have exactly two vowels


In [None]:
# Make a new series named prices that holds ["$2.99", "$1,200.25", "$5.99", "$2,350.00"]


In [None]:
# Reassign prices to hold only a string of numbers. Remove the $ and , characters.


In [None]:
# Reassign prices to be a float data type


In [None]:
# Multiply your prices series by 0.9
