Skip to content

The things here and there to lookup, about the class, assignments, etc.

License

Notifications You must be signed in to change notification settings

spu-bigdataanalytics-201/welcome

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Welcome ❕

Big Data

You made it! This is our start of a journey for the next few months!

In this introductory repository, you will be guided to the concepts and ways of how we will be doing the class, assignments, and participations. Each section in below will tell you about itself.

How to Prepare to the First Class

Things you will need to prepare before the first class is the things you will be using during the course of this class. Following is the stuff you need to be prepared for.

  1. Python
  2. Git
  3. SQL
  4. Integrated Development Editor (IDE)

Going over to the list above won't take more than a few hours of your time.

Python

In this course, we use Python. You can use either standard python, or anaconda distribution, up to your preference.

It is better to know Python syntax and learn some few stuff and sharpen your existing skills of Python ahead of the assignments. For that, you can use the following stuff.

Git

The assignments are stored and shared using GitHub and we use Git to be able to version and interact with our repositories.

My recommendation is go over the following stuff.

SQL

SQL is the heartbeat of your Analytical life! You need to know it to some extend! Following are some links to check out.

IDE

My favorite, bittersweet tool, Visual Studio Code, use it and you'll love it!

There are alternatives of course, you are welcome to use those as well. The point is an IDE will make your life much easier if you choose to use one!

Alternatives

Class Tools!

Following are some of the tools you can utilize for doing the assignments on this course.

Tool Owner Description How to Guide
Google Colab Google Colaboratory is built on top of Jupyter Notebook. basic_features_overview.ipynb, welcome.ipynb
Databricks Community Edition DataBricks Databricks is a micro-cluster as well as a cluster manager and notebook environment. FAQ, Login to Community Edition
Data Science Lab Saint Peter's University (SPU) SPU's computation resource is a cluster of workstations that can work together as one big systems. Be proud of it! Website, How do you ...
You computer You! Your local computer where you can use local tools to do your stuff.  ssh, Putty (for windows users), Jupyter Notebooks, Anaconda Distribution

Assignments

We use Github Classroom for assignments. Basically, how it works is described in down.

  1. I give you a link.
  2. You click on a link, and you are hacked, and I demand for a small randsom!
  3. Ignore the second step, you click the link, it will automatically create a repository for you under our GitHub webpage.
  4. You download this repository using git clone https://github.com/..../welcome.git.
  5. You work on the assignment, do a few commits, and git push it to GitHub.
  6. When you are done with the assignment, you go back to your repository, and download it, and upload it to the blackboard.
  7. I read it after the deadline passes, and give you a big 0.
  8. Ignore the 7th step, you will be graded.

Final Project

Your final project will be about a data science task, and the goal is to be able to analyze a large dataset using Apache Spark.

The task is an open discussion, and you are required to participate in this discussion!

At the end of this course, we will replace this line with the task we come up for our final project!

Grading

Grading will be based on syllabus. We will have assignments, midterm, and final project.

The letter grade of your final grade will be calculated using the following function. You can test this code by starting python from this folder's place.

from grader import calculate_letter_grade

# this can be you!
grade = calculate_letter_grade(
    assignments=[100, 100, 100, 90, 60, 100],
    final_project=100,
    midterm=90,
    return_grade=True
)

# gets A (97)
print(grade)

Your To-Do List for This Task

Conclusion

That's it! To a wonderful semester,

Happy coding!

By @metinsenturk.

Releases

No releases published

Packages

No packages published

Languages