<a href="https://colab.research.google.com/github/jefftwebb/undergrad_business_analytics/blob/main/Lab1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Lab 1: Hello World

In this course, you will be learning to analyze data using the python open source programming language.  Our focus, however, will not be on programming per se, but on data analysis. You will learn just enough programming to do data analysis. While this will give you a solid introduction to python, other courses in your academic program will focus specifically on python programming.

### Why use python for data analysis?

There are many. However, for our purposes we will focus on one main reason: collaboration. Using a programming language like python to prepare and analyze data makes every step in the process perfectly transparent. For example, suppose I decide to remove rows with missing values (NAs) from a dataset during an analysis.  That filtering step will be documented in my code. This makes it easy for my collaborators to see exactly what I've done and, if necessary, to correct my work or improve it. Contrast this to how 
a spreadsheet program like Microsoft Excel is often used.  Changes to the data in a spreadhseet typically go unrecorded and are therefore not available for easy review. Using a programming language for data analysis is thus a huge step forward for analytics because it *supports effective collaboration.* This is important because two heads are almost always better than one!

Note that other programming languages such as R, which is similar to Python, also work well for data analysis for the same reason.


### What is a Google Collab notebook?

A Collab notebook (such as this one) combines python code and text written in an extremely simple text processing language for document annotation called markdown. (If you double click the text sections in a Collab notebook you will see the raw markdown code on the left along with the compiled version on the right.) Data analysis occurs within the python code chunks, while the text preceeding or following those chunks contextualize and interpret the analytic results. Collab notebooks are simple to share and use because they do not require specialized software and run directly in your browser. 

Collab notebooks have a document type of .ipynb, which stands for "ipython notebook." These notebooks are easily saved in your Google Drive (make sure that you have a Google account with Drive enabled) either in notebook format (.ipynb) or as PDF. In this course you will be handing in labs as PDF documents. To compile your notebook to PDF follow these steps: File >> Print >> select "Save as PDF" under Destination in the upper right of the dialogue box >> click Save.

### Writing code in a notebook

Here is an example of a code chunk.



In [None]:
print("hello world")

What is happening here? If you wave your cursor over the lefthand corner of the code chunk you will see an arrow.  Click that arrow to run the the code in the chunk. In this case the python function, `print()`, prints the words "hello world" included within the parentheses. By running the code in this way, readers can essentially participate in the analysis, going from chunk to chunk. Obviously, printing out "hello world" does not count as data analysis, but it should give you a sense of how a notebook can be used to support collaboration. For example, a reader could easily change the text in that line of code.  

Here is a more substantive example.

In [None]:
x = 5

y = 7

x + y

`x` and `y` are examples of python objects. I have assigned values to them, which then allows me to do calculations with them. Note that the values of objects persist from the chunks above to the chunks below.  

In [None]:
x

In [None]:
y

The simplest way to use python is as a calculator.

In [None]:
5 * 7 # or, the same thing:

In [None]:
x * y

Using a hashtag in a chunk, `#`, is known as "commenting out" text that should not be evaluated.  Any text or code following the `#` Will not be evaluated by python. It is useful to add comments to code chunks, particularly when things start getting complicated, to provide interpretive guidance for readers, as well as for yourself, when you come back to an analysis after a break.

### Functions

Functions in python are extremely important.  Indeed, writing python code largely consists in using functions. A function is just a set of operations for accomplishing a task that have been bundled together for convenience. As we saw above, `print()` is a function, but, less obviously, so are `+` and `-`. Python includes many pre-defined functions but we can also create our own functions if needed.

We will use many statistical functions while doing data analysis. For example, to calculate the average of a set of numbers--what we will call a sample--we would use the `mean()` function.  (Remember: an average is calculated by adding up all the numbers in a sample, then dividing by the size of the sample. The average of 1 and 2, for example, will be 3/2.) ) The mean turns out to be extremely useful for summarizing, in a single number, what is known as the "central tendency" of a sample.

The syntax for using `mean()` is straightforward: simply include the numbers to be averaged within the parentheses. 




In [1]:
mean([5, 6, 7, 8, 9, 10])

NameError: ignored



Whoops.  Something has gone wrong.

There is a wrinkle here.  It is important to understand that python consists in various libraries that specialize certain kinds of operations.  In this course we will be using `numpy` (for working with numeric arrays) and `pandas` (for working with dataframes) among others.  To use a function, then, we will usually need to first import its library.  The `mean()` function is included in the `statistics` library.  Let's import the `statistics` library. 

In [3]:
import statistics

When using a function we must identify both library and function. In this case the code will be: `statistics.mean()`. Let's try the above code again. Make sure that you have run the above code chunk that imports the `statistics` library.

In [4]:
statistics.mean([5, 6, 7, 8, 9, 10])

7.5

### Assignment

Here are some tasks to give you practice using a Collab python notebook.  

1. Write python to produce the text "Submitted by: your name" where "your name" has been replaced with your name.
2. Write code to calculate the number of minutes in a day (24 hours).
3. Using the values of x and y defined above divide x by y. (In python, division is performed with the `/` operator.)
4. Create a new variable, z, defined as x divided by y, and show that z is the same as the value you calculated in the previous question.
5. Compute the average of the following numbers using `statistics.mean()`:  1, 2, 3, 4, 5, 6.

**Challenge**:  double-check the results you obtained for question 5 by writing your own code to calculate the mean.  

For each question **include a comment within the code chunk that explains what you are doing**.

Code chunks have been provided for you below. Make sure to run each chunk (click the arrow in the far left of the chunk) so that your code has been evaluated and the results are showing before you convert it to PDF.

Save your notebook to your Google Drive (following the instructions above) and then convert it to PDF.  Submit the PDF version to Canvas for the lab assignment.

Question 1:  Submitted by ...

Question 2: Calculate the number of minutes in a day

Question 3: divide x by y

In [None]:
x = 5
y = 7

# What is x divided by y?



Question 4: create z

In [None]:
# Define z as x divided by y.

# How do you know z has the same value as you calculated above in Q3?



Question 5: Calculate a mean

In [None]:
import statistics # Import the python module that includes the mean() function

values = [1, 2, 3, 4, 5, 6] # Define the series 

# Write code below to calculate the mean of values using the 
# library.function() syntax explained above.



Challenge:  Calculate the mean of 1, 2, 3, 4, 5, 6 without using `statistics.mean()`

In [None]:
# Calculate the mean
