# DSC 305 Lab 1: Getting Started

## Objectives

* Introduce the Jupyter Notebook environment. 
* Provide a brief review of Python programming skills, including: 
  * functions, 
  * nested loops, 
  * math operations, and 
  * importing packages. 
* Importing and working with a simple data set to answer questions.

## _Overview_

_To become an effective data scientist, you need extensive practice. Labs&mdash;weekly analytics assignments done outside of class&mdash;are an opportunity to strengthen your understanding of applied machine learning and other analytic methods.  While the value of each lab as a percent of your final grade is relatively small, the wisdom and experience you will gain from doing each lab will significantly help you on your tests and further career in data science and analytics._

_The first lab is an introduction to the Jupyter Notebook environment and a review of fundamental programming skills in Python3. In most future labs, you will write code and analyze data in collaboration with a lab partner&mdash;and then independently write your report. **The first lab, however, is meant for you to do independently**&mdash;perhaps with some help from your instructor. We will be using the Jupyter environment for all of our labs and for the final project, so it is important that you are comfortable working in this environment._

## _General instructions_

_Your work will consist of code (in Code cells) and corresponding output, as well as formatted text (in Markdown cells).  In your lab reports, you will alternate between text exposition and code, sometimes with output in text or graphics.  You will describe what you are doing, provide the code to do it, and make observations.  A major advantage of the Jupyter Notebook type environment is that it allows you to submit your code, analytic work, and discussion as a single document that tells a "story" describing your work._

* _First, rename your file:_  

  * _Replace the word `instructions` with your last name, an underscore, and your first name._ 
  * _The `.ipynb` extension specifies that the file is an iPython (Jupyter) notebook file._ 
  * _Filename should be all lowercase.  So, if your name is Yennefer of Vengerberg, your file name would be_ `dsc305_lab01_vengerberg_yennefer.ipynb`. 
  * _This will be your file format for all future labs._ 

* _The files you start with include problems in **bold text**.  Solve each problem, explain your work, and analyze your results. Leave the instructions in place to help organize your work and to help me know which problem your are solving._

* _Your code should be readable and should have occasional helpful comments._

* _Alternate your code with Markdown cells describing what you are doing and summarizing your results._

* _The starter files contain additional instructions in_ italics _that you should delete before submitting your solution._ 

* _Once you have completed the assignment, upload your file along with ALL data files that you import from a local directory. Your submission should include everything I need to recompile (run) it from start to finish. If any cell shows an error message or warning, you will receive a reduced score._

_Be sure to follow the instructions carefully for the labs. By following the instructions, you make it easier for me to assess your lab and get prompt feedback to everyone in time for your work on the subsequent lab. If you do not follow the instructions&mdash;even small things such as filenames or including data files&mdash;you will receive a reduced score._

# _Your work starts here. Good luck!_

_All of your files should start with a Markdown box with the following information on separate lines (you can make Markdown include linebreaks with two spaces at the end of the line, or you can use HTML tags if you prefer):_

your name (e.g., Yennefer of Vengerberg);  
name of your lab partner (e.g., Geralt of Rivia);  
course designation and semester (DSC 305A S20); and  
name of the assignment (e.g., Lab 1: Getting Started).

In [1]:
# Import any Python packages that you need here.
# All import statements should be in your first code cell. Example:
# # import pandas as pd
# # from math import sin, asin

**Import [this data file](https://raw.githubusercontent.com/jasperdebie/VisInfo/master/us-state-capitals.csv) containing the U.S. states and capitals and their geolocation (latitude and longitude).**

*You may use pandas if you like (which we will cover in class just after starting this lab), or use the methods that you learned in CSC 220. Store the data in a suitable data structure, such as a pandas dataframe, Python dictionary, or 2D list.*

In [2]:
# Your code for this step goes here. 
# Be sure to replace this comment.

**Look closely at your data.  Do you notice any problems?  If so, address them!**

*Oops! It seems that the github contributor left in some HTML tags in importing the data from the web. Remove the `<br>` tags from the capitals and address any other problems that you discover.*

In [3]:
# Your code for this step goes here. 

**Sort the capitals from west to east.**

*Output the data in the format*
``Atlanta, Georgia``
*with one capital per line. Do not include the latitude and longitude fields.*

In [4]:
# Your code goes here.

**Write a function, `distance(lat1, lon1, lat2, lon2)` that uses the [haversine formula](https://en.wikipedia.org/wiki/Haversine_formula) to estimate the distance between two locations (given as latitude and longitude, in _degrees_) on the earth's surface.**

_The haversine formula (see link above) can be used to derive a distance between two points on a sphere, as follows:_

$$d = 2r \arcsin\left(\sqrt{\sin^2\left(\frac{\varphi_2 - \varphi_1}{2}\right) + \cos(\varphi_1) \cos(\varphi_2)\sin^2\left(\frac{\lambda_2 - \lambda_1}{2}\right)}\right)$$

_In the formula above, $\varphi_1$ and $\lambda_1$ are the latitude and longitude of the first location *in radians*, and $\varphi_2$ and $\lambda_2$ are those of the second *in radians*.  The $r$ in the formula is the earth's radius&mdash;that is, the distance from the center of the earth to the surface.  The earth isn't perfectly spherical, but you can use $r \approx 3959$ miles as an approximation._

_The formula may seem a little intimidating, but it is straightforward to compute. No loops! Remember that you can import from the math module. You are also welcome to experiment with NumPy if you like. Also note that latitude and longitude use *degrees* as units, but the formula above requires *radians*.  Recall that $360^{\circ} = 2\pi$ radians. The Python3 `math` package provides a function `radians` that converts degrees to radians._

_**Important: Yes, there are packages that you can import that will compute the Haversine formula for you.  However, for this assignment, you should write the code yourself.**_

_By the way, you may want to double-click on this Markdown cell to study how the formula above is typeset.  You can embed mathematical expressions directly into Markdown cells using $\LaTeX$. For example, the Markdown code_

`$$E = mc^2$$`

_compiles into the familiar formula_

$$E = mc^2$$

*You can many more examples of how to typeset expressions at [this site](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Typesetting%20Equations.html).*

In [5]:
def distance(lat1, lon1, lat2, lon2):
    return 0 # delete this line and write your function here

**Use a few examples to verify that your formula is correct.**
* _Here are some locations you can try:_
  * _Danville, Kentucky, USA (37.6456° N, 84.7722° W)_
  * _Frankfort, Kentucky, USA (38.1867° N, 84.8753° W)_
  * _Frankfurt, Germany (50.1109° N, 8.6821° E)_
* _The distance from Danville to Frankfort is about 38 miles._
* _The distance from Danville to Frankfurt is about 4,421 mi (&plusmn; 0.5%)._
* _The distance from any location to itself should be 0._

**Use your function to compute the distance of every state capital from Danville, Kentucky.**

In [6]:
# Your code goes here

**Which two U.S. capitals are farthest apart? Which two are closest together?**

*To answer these questions, you will need to write code to compare every location in your dataset to every other location in your dataset.  You can do this, for example, in a nested loop. Your output should provide the answer.  Also include a Markdown box in which you state what you have found.*

In [7]:
# Your code goes here.

## Conclusion

_Include a summary of what you did in this lab, and what you learned._

## Acknowlegements

_In this section, in each lab, you should thank anyone who provided assistance. Also note anyone whom you helped._

## References

_If you relied on any external sources, including websites, you should include that as a bibliographic citation._

_Check that you have followed all of the instructions and answered all the questions and that your report is in the specified format, alternating between exposition and code with output. Then upload your work to Moodle._