You made it! This is our start of a journey for the next few months!
In this introductory repository, you will be guided to the concepts and ways of how we will be doing the class, assignments, and participations. Each section in below will tell you about itself.
Things you will need to prepare before the first class is the things you will be using during the course of this class. Following is the stuff you need to be prepared for.
- Python
- Git
- SQL
- Integrated Development Editor (IDE)
Going over to the list above won't take more than a few hours of your time.
In this course, we use Python. You can use either standard python, or anaconda distribution, up to your preference.
It is better to know Python syntax and learn some few stuff and sharpen your existing skills of Python ahead of the assignments. For that, you can use the following stuff.
- Learn the syntax, W3Schools Python Tutorial
- Online Tool to Learn Python, learnpython.org
- A rather broader place to learn Python, Geek for Geeks
The assignments are stored and shared using GitHub and we use Git to be able to version and interact with our repositories.
My recommendation is go over the following stuff.
- Your first steps towards Git, Learn to use Git
- Learn GitHub and Git using GitHub Lab, Github Lab
- Useful to understand how Git works, Git Cheat Sheet
- Useful to understand how GitHub works, Github Flow
- Useful for your README.md files, Github Flawored Markdown Cheat Sheet
- Protips from Data Scientist at GitHub, Tips, tricks, hacks, and secrets from Alyson La
- A simple git learning experience with a desktop app, Git-it (Desktop App)
SQL is the heartbeat of your Analytical life! You need to know it to some extend! Following are some links to check out.
- Learn the syntax, W3Schools SQL Tutorial
- Interactive Online SQL Learning Tool, SQLBolt
- Free class on SQL, CodeAcademy SQL
My favorite, bittersweet tool, Visual Studio Code, use it and you'll love it!
There are alternatives of course, you are welcome to use those as well. The point is an IDE will make your life much easier if you choose to use one!
Alternatives
- Atom
- Sublime
- Notepad++ (I use it as an advanced Notepad, rather than an IDE)
- And many, many more over here.
Following are some of the tools you can utilize for doing the assignments on this course.
Tool | Owner | Description | How to Guide |
---|---|---|---|
Google Colab | Colaboratory is built on top of Jupyter Notebook. | basic_features_overview.ipynb, welcome.ipynb | |
Databricks Community Edition | DataBricks | Databricks is a micro-cluster as well as a cluster manager and notebook environment. | FAQ, Login to Community Edition |
Data Science Lab | Saint Peter's University (SPU) | SPU's computation resource is a cluster of workstations that can work together as one big systems. Be proud of it! | Website, How do you ... |
You computer | You! | Your local computer where you can use local tools to do your stuff. | ssh, Putty (for windows users), Jupyter Notebooks, Anaconda Distribution |
We use Github Classroom for assignments. Basically, how it works is described in down.
- I give you a link.
- You click on a link, and you are hacked, and I demand for a small randsom!
- Ignore the second step, you click the link, it will automatically create a repository for you under our GitHub webpage.
- You download this repository using
git clone https://github.com/..../welcome.git
. - You work on the assignment, do a few commits, and
git push
it to GitHub. - When you are done with the assignment, you go back to your repository, and download it, and upload it to the blackboard.
- I read it after the deadline passes, and give you a big 0.
- Ignore the 7th step, you will be graded.
Your final project will be about a data science task, and the goal is to be able to analyze a large dataset using Apache Spark.
The task is an open discussion, and you are required to participate in this discussion!
At the end of this course, we will replace this line with the task we come up for our final project!
Grading will be based on syllabus. We will have assignments, midterm, and final project.
The letter grade of your final grade will be calculated using the following function. You can test this code by starting python from this folder's place.
from grader import calculate_letter_grade
# this can be you!
grade = calculate_letter_grade(
assignments=[100, 100, 100, 90, 60, 100],
final_project=100,
midterm=90,
return_grade=True
)
# gets A (97)
print(grade)
- You will mark things you did in this list, like this one.
- Go over on this README file, entirely.
- Go over the links in the things you need to prepared for section.
- Review the commit history on this repository.
- Try the grading function in grading section.
- Participate in final project discussion!
- Follow steps in assignments, and push your final changes to GitHub, to your repository.
That's it! To a wonderful semester,
Happy coding!
By @metinsenturk.