Hello, and welcome to an intentionally fun and light-hearted approach to learning data science for beginners! In this course, you will learn the following:
- How to set up your environments like a proper data scientist
- Learn the basics of the Python language for data science
- Import popular Python libraries and data files
- Perform some Exploratory Data Analysis (EDA)
- Complete some Data Visualization
We will be exploring this dataset created for fun by Chris Albon (github.com/chrisalbon)! Yes, we're going to have a lot of fun with this one.
This course is designed for complete beginners, but knowing the following will help tremendously:
- Some basic Python code. I strongly recommend going through the Codecademy tutorial or Learn Python The Hard Way.
- An understanding of working in terminals or command prompts, especially navigating folders and files. Here's an example.
- An understanding of how GitHub works, or just knowing how to download a text file to the right location. You might as well download the entire repository, but just in case, here's all you need for this lesson.
If you have any suggestions on ways to make this class even more accessible, please suggest them!
*To begin, we need to install some software on your computer that will run the following:
- Python (we'll be working with Python 3, but Python 2 should be just fine, too)
- Jupyter Notebooks, a great tool for iterating quickly in Python
- Various Python libraries (we'll be using NumPy, Pandas, and Matplotlib)
Fortunately, we can do all of the above simply by installing one thing: Anaconda from Continuum Analytics
We won't be sharing specific steps because it varies per computer.
https://continuum.io/downloads
- Go to the top-right corner and find the "Clone or Download" button. Click on it.
- Click on
Download ZIP
and save that file somewhere on your computer. - Unzip the file you just downloaded and remember where that directory is.
- Open up your Command Prompt, Terminal, Git, etc..
- Navigate to the folder/directory where you downloaded this GitHub repository.
- Type
jupyter notebook
into the command prompt and wait a moment.
If a webpage opens that says 'Jupyter' at the top, you're ready to move forward!
At this point, you can open up your own Jupyter Notebook or, if you must, use this pre-populated one here.
Why are we using Python instead of R or some other language? Ah, this old debate. Although others have a stronger opinion than I do on the matter, I've just happened to learn most of my data science via Python and discovered a great community around it. Generally, I code in JavaScript or Python, and anyone will tell you that it doesn't really matter what you code in as long as you know how to code.
If there's enough interest in converting this lesson into R or another langauge, perhaps I'll do it... (but probably not).
Lee Ngo is a self-described 'Education Technology Community Architect,' and is perpetually passionate about inclusivity, engagement, and empathy in spaces of professional advancement. Lee serves as national data science evangelist for Metis. Previously, Lee served as an evangelist for Galvanize based in Seattle. Previously he worked for UP Global (now Techstars) and founded his own ed-tech company in Pittsburgh, PA. Lee believes in learning by doing, engaging and sharing, and he teaches code through a combination of visual communication, teamwork, and project-oriented learning.
You can email him at lee-dot-ngo-at-gmail-dot-com for any further questions.
Disclaimer: This lesson is entirely open-source, unaffilated with any other entities and intended for educational and entertain purposes. The data used remains unchanged from its initial source out of respect to its author and the inspired material. Please feel free to fork, clone, remake, sample, and enjoy as your please under the MIT License.