# Lecture 1 - Introduction to Data Science

## What is Data Science?

<div><img src="figures/DataScience.png", width="800"><!div>

**Data Science** is a broad inter-disciplinary field.

Put succinctly it can be defined as: generate **meaning** from **data**, or alternatively, turn data into **knowledge**.

* **Data** is any type of measurable information. For example: ratings on an Amazon product (numerical or written), heart rate as collected through a wearable device, number of scored shots in a given basketball game 
* The **knowledge** extracted from data uses probability and  statistical principles, often times summarized in form of a plot

### Some examples of data science problems we have studied in previous years:

* Do gun laws reduce firearms mortality?

* If we look at the effect of state laws on firearms mortalities, what other factors might be responsible for any observed differences? Do urban and rural states have different firearms mortalities? How about richer or poorer states?

* Do males score higher on standardized high school math and science tests than females?

* Do first pregnancies last longer than other pregnancies?

* How should we measure the effectiveness of medical tests, including risks with false identification and missed detection of disease?

## What tools will we use to perform data science?

<div><img src="figures/tools.png", width="800"><!div>

## Why should you care?

According to **CareerCast**'s report [The Best Jobs of 2019](https://www.careercast.com/jobs-rated/best-jobs-of-2019?page=0):

<div class="alert alert-info" role="alert">
  <strong>1. Data Scientist</strong>
  
Overall Rating: 97

Median Salary: \\$114,520

Projected Growth: 19%
</div>

<div class="alert alert-info" role="alert">
  <strong>2. Statistician</strong>
  
Overall Rating: 110

Median Salary: \\$84,760

Projected Growth: 33%
</div>

# EEL 4930 - Data Science for ECE

**Course Description:** (4 credits) Analysis, processing, simulation, and reasoning of data. Includes data conditioning and plotting, linear algebra, statistical methods, probability, simulation, and experimental design.

* **<font color=blue>This course relies on both programming and math!</font>**

* We will use Python almost every day, for class assignments, **and for exams**

![DataScienceSoftware](https://i2.wp.com/r4stats.com/wp-content/uploads/2019/05/Fig-1a-IndeedJobs-2019-1.png?w=650)

## Programming in Python

* You are **not** expected to already know Python
* However, you do need good basic programming skills to do well in this course

#### What Python proficiency to expect to learn in this class?

I will **not** teach you everything you need to know about Python during class hours; instead, I will teach you how to use it for visualization, data processing, simulation, etc.

* I **will** provide you with lots of resources and help to get going in Python

* I **will** be available during office hours to help you with Python issues

## Analytical work

I will also be doing analytical probability and statistics, which will require you to be proficient in Calculus.

* I have uploaded to Canvas a Jupyter Notebook with review material up to Calculus II (equivalent to MAC 2312).
* No prior knowledge of probability or statistics is needed

**Pre-requisite:** MAC 2312 (Calculus 2), students are expected to bring a portable computer to class meetings. Students need basic programming skills.

## How does a typical lecture looks like?

A typical lecture will be similar to the "Programming with Python" video I uploaded to our Canvas page, view it [here](https://ufl.instructure.com/courses/404371/pages/orientation).

* I will publish the class notes (Jupyter notebooks) before every lecture
* During class, I will live code
* I will share the notebook with edits after class
* In lectures with analytical problems, I will use my iPad using the [Explain Everything app](https://explaineverything.com/download/) as a virtual whiteboard
* I will share handwritten whiteboard pages after class

## Course Objectives (as time allows)

Upon completion of this course, the student will be able to:

1. Generate visualizations to expose meaning in data
2. Generate and understand the meaning and uses of summary statistics of data
3. Model random phenomena using random variables
4. Generate random variables with specified densities or distributions
5. Conduct hypothesis tests using simulations and analysis
6. Understand and use conditioning to simplify problems
7. Estimate parameters of distributions from samples
8. Understand dependence and independence among random phenomena
9. Use statistical tests to determine or characterize dependence among random phenomena
10. Design experiments to understand random phenomena
11. Understand the difference between Bayesian statistics and classical statistics
12. Use simulation to calculate Bayesian statistics
13. Apply linear algebra for data processing and statistical calculations

The main goal of this course is to equip the students with a data science mindset for successful practical implementations, in particular: understand, analyze, and design an approach to work with a data science or electrical engineering problem.

### Contribution of course to meeting the professional component

4 credits of Engineering Science

**Relationship of course to program outcomes:**
1. An ability to identify, formulate, and solve engineering problems by applying principles of engineering, science, and mathematics. $\Rightarrow$ High

2. An ability to apply both analysis and synthesis in the engineering design process, resulting in designs that meet desired needs. $\Rightarrow$ High

3. An ability to develop and conduct appropriate experimentation, analyze and interpret data, and use engineering judgment to draw conclusions. $\Rightarrow$ High

## Instructor

**Dr. Catia S. Silva** (preferred Dr. Silva or Prof. Silva)

* Office: *my make-up home office* or NEB 467
* Phone: (352) 392-6502
* Email: catiaspsilva@ece.ufl.edu (email via Canvas is preferred)
* [Office hours](https://ufl.instructure.com/courses/404371/pages/office-hours): Mondays, Wednesdays and Fridays 3:00 PM - 4:00 PM (EST zone) via Zoom

## Time commitment

We will meet 4 classes/week, 50 minutes each. 

Sometimes (not every week) I will post additional video recordings to watch before class. I typically keep them short, 20-30 minutes.

As a student, I used the *Rule of Four Passes* to determine the time commitment for a course:

* Essentially it takes at least four complete passes through your lecture materials from start to finish to be able to retain it for the exam. 
* The first pass is during lecture time (50 minutes) where you listen, take notes and ask questions. 
* The second pass goal is to understand the material more deeply, typically you rewrite your in-class notes, find answers to misunderstandings or unanswered questions, and implement coding examples yourself.
* The third pass you will rigorously connect the material between lectures by using your notes and coding from pass 2, and will identify important concepts to retain.
* The fourth pass is self-review, where you review all material, quiz yourself, practice textbook or previous exams exercises.

Expect approximately 5-6 hours/week to study lecture notes, plus additional time for homeworks.

## Textbooks and Software Required

1. **Anaconda Distribution**

* with Python 3.8
* It includes all libraries, modules and tools we will use: Jupyter notebooks, ```NumPy```, ```Matplotlib```, ```SciPy```, ```Pandas```, ```scikit-learn```, ```random```
* Download it [here](https://www.anaconda.com/products/individual)

<h5 align="center">Some popular libraries in Anaconda</h5>

![Anaconda](https://www.anaconda.com/imager/assetsdo/Products/8031/open-source-logos2x_680db6b6f11f9cc710dd7defae241cd3.png)

You have 2 options to manage your packages and virtual environment/s:

1. using ```pip```. System that manages Python packages.
2. using ```conda```. System that manages packages that may be written in any programming language. 

Since we will use Python packages, you can use either one of these systems to manage your virtual environment. Which one to use typically comes with your specific needs. I typically use ```conda``` and that would be sufficient for this course. 

To create and manage your **virtual environments**, here are good sources:

* using ```conda``` to [manage virtual environments](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html?highlight=environment#creating-an-environment-with-commands).
* using ```pip``` to [manage virtual environments](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/).

2. **Introduction to Probability**

* Author: Dimitri P. Bertsekas, John N. Tsitsiklis
* Edition: 2nd
* Publisher: Athena Scientific
* Year: 2008
* ISBN: 978-1-886529-23-6

An **e-book version** will be cheaper and is perfectly fine for this course. The authors created an instructional digital version of the book and you can download it [here](http://faculty.pucit.edu.pk/faisal/ma249/book.pdf)

![BertsekasTextbook](https://images-na.ssl-images-amazon.com/images/I/51YN28ow7rL._SX389_BO1,204,203,200_.jpg)

3. **Introduction to Applied Linear Algebra - Vectors, Matrices, and Least Squares**

* Author: Stephen Boyd
* Edition: 1st
* Publisher: Cambridge University Press
* Year: 2018
* ISBN: 978-1-886529-23-6

An **e-book version** will be cheaper and is perfectly fine for this course: you can download it [here](http://vmls-book.stanford.edu/vmls.pdf)

![BoydTextbook](https://images-na.ssl-images-amazon.com/images/I/418ANMKbtQL._SX379_BO1,204,203,200_.jpg)

* Additional reading will be listed in our Canvas page
* All extra reading materials will be pointed to **digitally-available** books posted on Course Reserves

## Course Schedule

A complete course schedule can be found in our [Syllabus](https://ufl.instructure.com/courses/404371/files/folder/Syllabus?preview=51631926).

The semester will have 2 parts:

1. **Probability and Statistics.** Main textbook will be "Introduction to Probability". Week 1-8 (~30 lectures)
2. **Linear Algebra.** Main textbook will be "Introduction to Applied Linear Algebra". Week 9-15 (~23 lectures)

<h2 align="center">Have you heard of Zoom?</h2>


![Zoom](https://cuit.columbia.edu/sites/default/files/styles/cu_crop/public/content/zoom-logo-transparent-6.png?itok=PJk3QEss)

## Zoom Settings and Expectations

1. Choose to turn on your camera. It will help me *read your faces* and identify clarity issues. 

2. I will ask you to mute your microphones through the lesson unless you have a question/comment.

3. Ask questions! To ask a question in class, either:
    * Unmute your microphone and speak up
    * Type your question in the chat box
    * Raise your hand, using the *raise hand* feature (under center-low bar --> People --> Raise Hand)
    
4. All lectures will be **recorded** (audio and video). All videos will be available to you in our Canvas page under "Zoom Conferences" icon.
    * Students who participate with their camera engaged or utilize a profile image are agreeing to have their video or image recorded. 
    * If you are unwilling to consent to have your profile or video image recorded, be sure to keep your camera off and do not use a profile image. 
    * Likewise, students who unmute during class and participate orally are agreeing to have their voices recorded. If you are not willing to consent to have your voice recorded during class, you will need to keep your mute button activated and communicate exclusively using the "chat" feature, which allows students to type questions and comments live. 
    * The chat will **not** be recorded or shared. As in all courses, unauthorized recording and unauthorized sharing of recorded materials is prohibited.

## Course Homepage

In this course we will use two main *households*:

1. [Canvas page](https://ufl.instructure.com/courses/404371): I will post announcements, you will send/receive emails, participation assignments through discussion boards, question and answering on assignment issues through discussion boards.

2. [GitHub Organization](https://github.com/Data-Science-for-ECE-F20): I will post all lecture notes in this private organization. You will complete all assignments in a private repository and send its URL to Canvas as the assignment submission.
    * In order to receive an invite to join the organization, join the GitHub Classroom first by accepting to create a [Short Assignment 0](https://ufl.instructure.com/courses/404371/assignments/4332677) repository.
    * **Clone (at least) the "Lectures" repository to your local machine and pull from that repository before class**
    * Be sure to [download Git](https://git-scm.com/downloads)
    * Complete one (or a few) introductory tutorials:
        * Git bootcamp: https://help.github.com/categories/bootcamp/
        * Tutorials: https://www.atlassian.com/git/tutorials/
        * Interactive Introduction: https://try.github.io/

## Course Policies

Please read the syllabus carefully.

1. **How to get help:** Other than office hours, you can contact me via email (via Canvas or using my email address), call me on my cellphone, or use the Slack channel to communicate with your classmates.
    * Slack channel: https://join.slack.com/t/uf-eel4930-fall2020/shared_invite/zt-gp53hhdv-vsxWciTfhQlNh_TU7hvibQ
    
2. **Attendance:** I expect students to attend class, and graded evaluations (participation, in-class assignments) will be given during class. If you are living in a different time zone or have anticipate any issues with e.g. internet connection, schedule conflict, etc., please email me so I'm informed and will not *penalize* participation points.

3. **Grading:** make sure you submissions are carefully completed with clean and well documented code. Make full use of Jupyter features, such as markdown cells. Individual assignments will **not** be curved. Final grades **will** be curved.

4. **Late Work:** I will accept all assignment submissions as long as solutions have not yet been released, but you will lose the **on-time** points listed in the rubric. Solutions will typically be released up to 1 week after the assignment is due.

5. **Make-Up Policy:** If you feel that any assignment needs to be re-graded, you must discuss this with me within 1 week of grades being posted. If approved, the entire assignment will be subject to complete evaluation. Excused absences must be consistent with university policies in the [undergraduate catalog](https://catalog.ufl.edu/ugrad/current/regulations/info/attendance.aspx) and require appropriate documentation

6. **Collaboration:** healthy collaboration is encouraged. If another student contributes substantially to your understanding of a problem, you should cite this student. You will not be negatively judged for citing another student.

7. **Cheating and Plagiarism:** you are expected to submit your own work. If you are suspected of dishonest academic activity, I will invite you to discuss it further in private. Academic dishonesty will likely result in grade reduction, with severity depending on the nature of the dishonest activity. I am obligated to report on academic misconduct with a letter to the department, college and/or university leadership. Repeat offenses will be treated with significantly greater severity.

## Grading

Grading will be based on: 

|Assignment|Total|Percentage Final Grade|
|---|---|---|
| Exams | 3 | 20% each|
| Homework | ~ 5 | 20%|
| Short Assignments | ~ 12 | 10% |
| Participation | ~10 + in-class | 10%|

**Homeworks** will have 2 parts: (1) analytical exercises, typically solved on paper. (2) practical problems to be implemented in Python.

**Exams** will be drawn from lectures and readings. Practice exams will be provided. Exams are conducted via Honorlock.

**Short Assignments** will typically consists of short problems to help consolidate and retain the information learned in class.

**Participation** will be in the form of discussion boards participation **and** in-class participation (attending class also counts as participation)

### Honorlock

The midterm and final exams will be conducted via **Honorlock**. 

Honorlock is an online proctoring platform. Be sure to read the syllabus to find out what you will need to take an exam with Honorlock.

**<h2 align="center"><font color=orange> Mark your calendars!</font></h2>**

* **<font color=blue> Exam 1 Date (for both sections): Thursday, October 6</font>**

* **<font color=blue> Exam 2 Date (for both sections): Thursday, November 10</font>**

Use Canvas to select all time slots that work for you for both exams: [Canvas Survey](https://ufl.instructure.com/courses/404371/quizzes/883723)

* **<font color=blue> Final Exam Date (scheduled)</font>**
    * **<font color=blue> section 0003: Friday, December 18 @ 7:30 AM - 9:30 AM</font>**
    * **<font color=blue> section 0004: Wednesday, December 16 @ 3:00 PM - 5:00 PM</font>**

#### Grading Scale

|Percent|Grade|Grade Points|
|--|--|--|
|93.4 - 100| A| 4.00|
|90.0 - 93.3| A-|3.67|
|86.7 - 89.9| B+|3.33|
|83.4 - 86.6| B |3.00|
|80.0 - 83.3| B- |2.67|
|76.7 - 79.9| C+ |2.33|
|73.4 - 76.6| C |2.00|
|70.0 - 73.3| C- |1.67|
|66.7 - 69.9| D+ |1.33|
|63.4 - 66.6| D |1.00|
|60.0 - 63.3| D- |0.67|
|0 - 59.9| E |0.00|

## Students Requiring Accommodations

Students with disabilities who experience learning barriers and would like to request academic accommodations should connect with the disability Resource Center by visiting https://disability.ufl.edu/students/get-started/. 

* Please make sure you share your accommodation letter with me as soon as you have it, so we can discuss your access needs.

## Course Evaluations

You are expected to provide professional and respectful feedback on the quality of instruction in this course by completing the course evaluations online via GatorEvals.

* The University is using a relatively new evaluation system, and evaluation results are now publicly available here: https://gatorevals.aa.ufl.edu/public-results/

* Guidance on how to give feedback in a professional and respectful manner is available at https://gatorevals.aa.ufl.edu/students/. 

* You will be notified when the evaluation period opens, and can complete evaluations through the email you receive from GatorEvals, in the Canvas course menu under GatorEvals, or via https://ufl.bluera.com/ufl/.

## University Honesty Policy

All UF students are bound by The Honor Pledge which states:

<h5 align="center">We, the members of the University of Florida community, pledge to hold ourselves and our peers to the highest standards of honor and integrity by abiding by the Honor Code. On all work submitted for credit by students at the University of Florida, the following pledge is either required or implied: "On my honor, I have neither given nor received unauthorized aid in doing this assignment."</h5>

The [Honor Code](https://sccr.dso.ufl.edu/policies/student-honor-code-student-conduct-code/) specifies a number of behaviors that are in violation of this code and the possible sanctions. Furthermore, you are obligated to report any condition that facilitates academic misconduct to appropriate personnel. If you have any questions or concerns, please consult with the instructor or TAs in this class.

## Commitment to a Safe and Inclusive Learning Environment

The Herbert Wertheim College of Engineering values broad diversity within our community and is committed to individual and group empowerment, inclusion, and the elimination of discrimination.  It is expected that every person in this class will treat one another with dignity and respect regardless of gender, sexuality, disability, age, socioeconomic status, ethnicity, race, and culture.

If you feel like your performance in class is being impacted by discrimination or harassment of any kind, please contact your instructor or any of the following:

* Your academic advisor or Graduate Program Coordinator
* Robin Bielling, Director of Human Resources, 352-392-0903, rbielling@eng.ufl.edu
* Curtis Taylor, Associate Dean of Student Affairs, 352-392-2177, taylor@eng.ufl.edu
* Toshikazu Nishida, Associate Dean of Academic Affairs, 352-392-0943, nishida@eng.ufl.edu

**Software Use**

All faculty, staff, and students of the University are required and expected to obey the laws and legal agreements governing software use.  Failure to do so can lead to monetary damages and/or criminal penalties for the individual violator.  Because such violations are also against University policies and rules, disciplinary action will be taken as appropriate.  We, the members of the University of Florida community, pledge to uphold ourselves and our peers to the highest standards of honesty and integrity.

**Student Privacy**

There are federal laws protecting your privacy with regards to grades earned in courses and on individual assignments.  For more information, please see:  https://registrar.ufl.edu/ferpa.html



## Health and Wellness

**U Matter, We Care:**

Your well-being is important to the University of Florida.  The U Matter, We Care initiative is committed to creating a culture of care on our campus by encouraging members of our community to look out for one another and to reach out for help if a member of our community is in need.  If you or a friend is in distress, please contact umatter@ufl.edu so that the U Matter, We Care Team can reach out to the student in distress.  A nighttime and weekend crisis counselor is available by phone at 352-392-1575.  The U Matter, We Care Team can help connect students to the many other helping resources available including, but not limited to, Victim Advocates, Housing staff, and the Counseling and Wellness Center.  Please remember that asking for help is a sign of strength.  In case of emergency, call 9-1-1.

**Counseling and Wellness Center:** http://www.counseling.ufl.edu/cwc, and  392-1575; and the University Police Department: 392-1111 or 9-1-1 for emergencies. 

**Sexual Discrimination, Harassment, Assault, or Violence**
If you or a friend has been subjected to sexual discrimination, sexual harassment, sexual assault, or violence contact the Office of Title IX Compliance, located at Yon Hall Room 427, 1908 Stadium Road, (352) 273-1094, title-ix@ufl.edu

**Sexual Assault Recovery Services (SARS)**, Student Health Care Center, 392-1161. 

**University Police Department** at 392-1111 (or 9-1-1 for emergencies), or http://www.police.ufl.edu/. 


## Academic Resources

**E-learning technical support**, 352-392-4357 (select option 2) or e-mail to Learning-support@ufl.edu. https://lss.at.ufl.edu/help.shtml.

**Career Resource Center**, Reitz Union, 392-1601.  Career assistance and counseling. https://www.crc.ufl.edu/.

**Library Support**, http://cms.uflib.ufl.edu/ask. Various ways to receive assistance with respect to using the libraries or finding resources.

**Teaching Center**, Broward Hall, 392-2010 or 392-6420. General study skills and tutoring. https://teachingcenter.ufl.edu/.

**Writing Studio, 302 Tigert Hall**, 846-1138. Help brainstorming, formatting, and writing papers. https://writing.ufl.edu/writing-studio/.

**Student Complaints Campus:** https://care.dso.ufl.edu.

**On-Line Students Complaints:** http://www.distance.ufl.edu/student-complaint-process.

**<h1 align="center">Any Questions?</h1>**

## Git Demonstration

To get familiar with Git, the best thing to do is *practice*!

For beginner Git users, I recommend you to watch one or a few Git tutorials:
* Git bootcamp: https://help.github.com/categories/bootcamp/ (Links to an external site.)
* Tutorials: https://www.atlassian.com/git/tutorials/ (Links to an external site.)
* Interactive Introduction: https://try.github.io/

### How to ```clone``` a repository

You can use **Git Bash** to clone a repo or use GitHub Desktop. **I will demonstrate how to do it using GitHub Desktop.**

Let's open our organization: https://github.com/Data-Science-for-ECE-F20

### Getting the latest edits from a repository - use ```git pull```

To ```pull``` from a repository, simply call ```git pull``` using Git Bash.

### How to manage files within a repo

The 3 most used Git commands are: ```git pull```, ```git add```, ```git commit``` and ```git push```. You can call these commands directly on the **Git Bash** console within the cloned repository on your machine.

**I will know demonstrate how I ```push``` this Notebook to the "Lectures" repository.**

# To prepare for next class

1. Install [Anaconda](https://www.anaconda.com/products/individual)
2. Install [Git](https://git-scm.com/downloads)
3. For beginner Git users, I recommend you to install [GitHub Desktop](https://desktop.github.com/)
4. Watch the **"Introduction to programming with Python" video** I have uploaded to Canvas and follow along with the Jupyter Notebook I provided.
5. Take a look at the **Modules** page in Canvas and get familiar with a typical lecture layout: it includes readings and activities to help you study.
6. Create [Short Assignment 0](https://ufl.instructure.com/courses/404371/assignments/4332677) repository. You will become a collaborator to the organization and I will send you an invitation to become a member and have full access to the lectures notes and other repositories.
    * You will **not** be able to see each other private assignment repositories.
    * Our organization is also **private**.

**<h1 align="center">Any Questions?</h1>**

Feel free to email me afterwards or come talk with me during office hours: MWF 3-4 PM.