# Intro to Jupyter Notebook and Other Useful Techniques!

Jupyter Notebook is a very useful tool to write organized Python code. Instead of writing all of your code in one long script, Jupyter Notebook allows you to organize your code in chunks called cells. This text cell (and the heading) are made in a markdown cell - it can be used to write notes about your code or hold long non-code blocks of text! Some people even write entire reports or presentations in Jupyter Notebook. This is great if you want to include a lot of code or run code during a presentation. Using the dropdown menu between the refresh and keyboard buttons, change the next cell to markdown, type something, and press the "run" button above to run it!

Hover over the white buttons starting with the save icon to learn what each one does!

Keyboard Shortcuts
- ctrl + enter: run a cell
- when the cell has a BLUE box around it (clicking by the 'In []:' part of the page), press a to add a cell above it and b to add a cell below it

# Pandas and Regular Expressions!

Pandas is a very useful data analytics library and will be useful to know in your data science/analytics classes and careers. This tutorial will teach you how to load a dataset into a data frame and make use of regular expressions. To learn more about regular expressions see this link: https://docs.python.org/3/library/re.html

In [None]:
#import libraries
import pandas as pd
import re

We are going to look at customer data from a fictional company. Start by reading the dataset chinook_customers.csv into a dataframe called customers in the cell below. Run the cell once you're done to see the data.
- pd.read_csv('insert file name here')

In [None]:
customers = pd.read_csv('chinook_customers.csv')
customers

Now, there are a lot of things you can do with pandas now that this data is loaded. Pandas is the foundation for a lot of the data cleaning and machine learning programming tasks in the data science world. For the purpose of this exercise, we are going to use regular expressions to validate email format and phone numbers!

# Regular Expressions: Emails

- The goal: make sure the emails are all in a standard format of "text/numbers@text/numbers.text"
- Step 1 (completed for you, run the cell): create a true/false email format column - this column will say True if the email format is valid or False if it's not
- Step 2: write the regular expression and populate the valid email column
- Step 3 (completed for you, run the cell): return the rows where the true/false column is false - you should see 4 invalid emails!


In [None]:
#Step 1 - inserting a NA column called ValidEmail next to the email column
import numpy as np
customers.insert(loc=12, column='ValidEmail', value=np.nan)

In [None]:
#Step 2 - regular expression to detect emails
regex = re.compile(r"insert regex here") #\w\S*@.*\w\S\.\w\S
customers['ValidEmail'] = customers['Email'].apply(lambda x: 'True' if regex.match(x) else 'False')
customers

In [None]:
#Step 3 - return invalid email rows!
customers[customers['ValidEmail'] == 'False']

# Regular Expressions: Phone Numbers

Often when we are working with data, we want to search for a particular pattern in the data, for example phone numbers and email addresses follow common patterns. This task of searching and extracting is so common that Python has a very powerful library called regular expressions that handles many of these tasks quite elegantly. The syntax is a little odd, but once you get used to it, you will see how powerful they are and how easy they can make your data managing life. 

Entire books have been written on the topic of regular expressions. For more detail on regular expressions, see:
https://docs.python.org/library/re.html

Since the phone numbers in this data vary, instead you will write regular expressions to detect the 3 phone number formats provided in the string below.

In [None]:
#RUN ME!
phonenumbers = '1234567890, 123-456-7890, (123) 456-7890'

In [None]:
#Find the first phone number pattern
pattern1 = re.findall(r"insert regular expression here", phonenumbers) 
for number in pattern1:
    print(number)

In [None]:
#Find the second phone number pattern
pattern2 = re.findall(r"insert regular expression here", phonenumbers) 
for number in pattern2:
    print(number)

In [None]:
#See if you can complete the third phone number pattern
pattern3 = re.findall(r"\(\d{3}\)\s\d{3}-", phonenumbers) 
for number in pattern3:
    print(number)