Skip to content

parrt/msds593

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

96 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MSDS593 Exploratory data analysis (EDA) and Visualization at the University of San Francisco

This course introduces students to the key...

Administrivia

INSTRUCTORS. Yannet Interian and Terence Parr. We are professors in the MS in data science program; Terence was also the founding director of MSDS at USF. Please call us Yannet and Terence or Professor.

OFFICE HOURS

  • Yannet 4-5pm on Tuesdays.
  • Terence is generally available on slack or email on-demand

SPATIAL COORDINATES:

  • All classes are remote but live online, courtesy of COVID-19.

TEMPORAL COORDINATES. Wed 25 August to Fri 2 October

Please note: California time is GMT-7 and all lectures will be over zoom and recorded. Recorded lectures will appear on Canvas.

Week one:

  • Lectures are live but recorded for international students:

    • US timezone: Wed&Fri 2-4PM California time
  • Live sessions for international students

    • Euro time zone: Thur 9-10AM California time
    • Asia time zone: Wed&Fri 7-8PM California time

Following weeks:

  • Lectures are live but recorded for international students:

    • US timezone: Tue&Thur 2-4PM California time
  • Live sessions for international students

    • Euro time zone: Wed&Fri 9-10AM California time
    • Asia time zone: Tue&Thur 7-8PM California time

INSTRUCTION FORMAT. Live class runs for 2 hours, 2 days/week. Instructor-student interaction during lecture is encouraged by speaking up in zoom. We'll often mix in mini-exercises / labs during class. All programming will be done in the Python 3 programming language, unless otherwise specified.

We will be interleaving lectures from Yannet on Tuesdays and Terence on Thursdays for the main lectures. Yannet will be in Tuesday evening and Wednesday morning sessions for internationals and Terence will be in Thursday evening and Friday morning sessions.

COURSE BOOKS

We will be using the following books available for free via the USF online library:

To get access, start at USF and then you can jump to the various books.

PROFESSIONALISM

The following items are even more important because all of us will be remote this Fall:

  • Showing respect for your classmates and your professor
  • Getting to class on time every time
  • No cellphones, email, social media, slack, texting during the class
  • Turn off all of your various notifications so you are not distracted
  • Turn on your webcam on zoom

Student evaluation

Artifact Grade Weight Due date
professionalism/attendance/labs 20%
3 Homeworks 20%
Quiz 1 10% 9/03
Quiz 2 10% 9/10
Quiz 3 10% 9/24
Quiz 4 10% 10/01
Group Project 20% 09/29

Each project has a hard deadline and only those projects working correctly before the deadline get credit.

I reserve the right to change projects until the day they are assigned.

Grading standards. I consider an A grade to be above and beyond what most students have achieved. A B grade is an average grade for a student or what you could call "competence" in a business setting. A C grade means that you either did not or could not put forth the effort to achieve competence. Below C implies you did very little work or had great difficulty with the class compared to other students.

Getting started

Python environment

You should start out by watching video A broad overview of python and tools used in our MSDS program on making sure you have the proper Python environment. Make sure you have Anaconda installed and a very recent version. From the Terminal app, you should see the following (with parrt replaced with your user ID or name):

$ which python
/Users/parrt/opt/anaconda3/bin/python
$ python
Python 3.8.3 (default, Jul  2 2020, 11:26:31) 
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 

You are to type which python and python at the $ prompt to get those results. It indicates which version of Python you're using, which should be Anaconda-based:

/Users/parrt/opt/anaconda3/bin/python

and version 3.8 or above. Note that / is the path separator character and so, for example, /Users/parrt/opt means the Users directory under the root of the disk, parrt under that directory, and finally opt directory under that.

Directory structure

We strongly recommend that you create an appropriate directory (folder) structure to organize the artifacts you create for classes and the resources you need. When dealing with programs and data, the names and structure of your directories become part of the program and must be precise. Please remember that upper and lowercase letters are meaningfully different to the operating system and Python.

Creating an appropriate structure can be done manually using the OS X Finder, but it's better if you get used to using the terminal. Launching the terminal and executing the following commands creates a reasonable data structure but you can change it to suit your needs.

cd ~                # Jump to your home directory 
mkdir classes       # Make a subdirectory (folder)
cd classes          # Jump into the classes directory
mkdir msds593
cd msds593
mkdir homework
mkdir labs          # Start jupyter in this dir; put notebooks here
cd labs
mkdir data          # Here is where you download data used by notebooks

In the Finder afterwards, the structure look something like this:

You will be doing exercises during class in Terence's lectures and you should create a new notebook for each lecture because you must submit a PDF printout of the exercises associated with each lecture.

Syllabus

We will be interleaving lectures from Yannet and Terence, but we can separate the topics into visualization, pandas for data manipulation, and matplotlib for basic plotting.

Visualization (Yannet)

  • value of visualization; the importance of context
  • introduction to visualization design; choosing an effective visual
  • visual perception and principles of design; clutter is your enemy!
  • multivariate and time series data visualization;
  • visualizing trees, maps, networks and text
  • storytelling with data

Viz Implementation and EDA (Terence)

Making plots with matplotlib

Topics:

  • plots: bar chart, histogram, scatter, line, box, strip, violin, bubble plot
  • images (MNIST)
  • displaying matrices / heatmaps
  • overlaid plots for comparing variables; arrays of plots; shared axes
  • drawing lines, shapes, text, annotations
  • altering axes, labels; titles
  • misc: legends, colorbar, linewidth, line style, colors, alpha channel

Pandas

Topics:

  • fundamentals: key data types. df vs series. numpy relationship. dealing with NaN for missing elements vs empty string. categorical versus numerical.
  • selecting, slicing, method chaining, indexes
  • sorting, removing duplicates, shuffle, sample
  • dates, strings
  • pattern matching during selection
  • aggregation, grouping, binning, quantiles
  • map function or dictionary to Series (apply?)
  • merging/joining/stacking

Administrative details

ACADEMIC HONESTY. You must abide by the copyright laws of the United States and academic honesty policies of USF. You may not copy code from other current or previous students. All suspicious activity will be investigated and, if warranted, passed to the Dean of Sciences for action. Copying answers or code from other students or sources during a quiz, exam, or for a project is a violation of the university’s honor code and will be treated as such. Plagiarism consists of copying material from any source and passing off that material as your own original work. Plagiarism is plagiarism: it does not matter if the source being copied is on the Internet, from a book or textbook, or from quizzes or problem sets written up by other students. Giving code or showing code to another student is also considered a violation.

The golden rule: You must never represent another person’s work as your own.

If you ever have questions about what constitutes plagiarism, cheating, or academic dishonesty in my course, please feel free to ask me.

All persons with common code are likely to be considered at fault.

USF policies and legal declarations

Students with Disabilities

If you are a student with a disability or disabling condition, or if you think you may have a disability, please contact USF Student Disability Services (SDS) for information about accommodations.

Behavioral Expectations

All students are expected to behave in accordance with the Student Conduct Code and other University policies.

Academic Integrity

USF upholds the standards of honesty and integrity from all members of the academic community. All students are expected to know and adhere to the University's Honor Code

Counseling and Psychological Services (CAPS)

CAPS provides confidential, free counseling to student members of our community.

Confidentiality, Mandatory Reporting, and Sexual Assault

For information and resources regarding sexual misconduct or assault visit the Title IX coordinator or USFs Callisto website.

About

MSDS593 -- Exploratory data analysis (EDA) at the University of San Francisco

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published