Skip to content

hathawayj/gitandgithub

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 

Repository files navigation

layout minimal
title Git and Github for Data Science

Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician. Josh Wills

The Darkside of that quote is real! Data scientists don't program as well as software engineers. Data scientists are also reasonably soft when it comes to understanding the larger field of statistical analysis. We can improve over time. However, our domain structure often demands that we don't specialize in the technical areas as we often scale up in other domains. If we did specialize, then we would be called statisticians or software engineers.

My school/professional journey

How do we demonstrate our data savviness and programming experience?

Github! Data scientists need to demonstrate their coding experience and data depth. Github provides us the social space to demonstrate these skills.

It is no exaggeration to say that git (and other forms of version control software) underlie the entire world of open-source software, and are central to the operation of nearly every tech company on the planet. ... OK, now the bad news: learning git kinda sucks. I mean, it’s not painful like performing an appendectomy on yourself without anesthesia, and it’s not hard like quantum mechanics or geometric topology; it’s definitely something anyone can learn. ref

Git

Github

GitHub is key to your employment as a Data Scientist.

This is GitHub, the world’s largest code repository platform online. A platform used by some 50 million software developers to host their coding projects, most of them open-source — meaning others can access their codes and modify them to create better versions if they feel like.

Most of the internet is produced or hosted on GitHub in the form of code. “What Gmail is to email, GitHub is to writing software,” says Kiran Jonnalagadda, co-founder of HasGeek, a platform to build and discover peer groups.

Read more here.

A primary differentiator between an analyst and a data scientist

It signals that you are a programmer as well as an analyst.

Github is our version control, and we have everything on Github. Definitely having strong git experience is very helpful. The way my team is using it is through forking. We fork the main file and then pull from and to it to update the code.

Keaton Sant, Data Scientist at John Deere

Is it going to hurt?

Yes.

It feels weird at first but quickly becomes second nature—more bad news. Our pain will be short-lived because students primarily work in their own repositories. Do you use GitHub to work with other people or to coordinate your own work from multiple computers? If so, after you recover from the initial setup, git will crush you again with merge conflicts. And this is not one-time pain; this could be a dull ache for a long time. The best remedy is prevention, but also understanding how to back out of tricky situations and tackle them on your own terms.

Managing a project via Git/GitHub is much more like the Google Doc scenario and enjoys many of the same advantages. It is definitely more complicated than collaborating on a Google Doc, but this puts you in the right mindset. ref

Github and education guidelines

  1. Don't post assignments
  2. Do post unique code and projects using skills from your classes
  3. Use private repos with student education account to manage your course work
  4. Use it to communicate

Managing your Github space

If you are trying to get a job, then your Github space should be organized. Take the time to make this space your coding ‘social media’ where people see the best side of your work.

  • Make your landing page stand out by Managing your profile README. Use this guide for additional inspiration.
  • Track your work and share it with the world.
  • Organize and document your repositories. Here are some great examples
  • Find a project you could support (long-term goal).

Github's other tools

Github desires to be the social communication tool for coders reference. Versioning and sharing code is the core. However, ignoring the other available tools is not wise.

Your personal data projects workflow

You don't need to make these projects complicated. These projects are built to show your work using the skills you have developed during school. I would make sure that these personal projects are presentable. You want to demonstrate your creativity. You could use the following links to find a new data set.

Let's go through the prAcess with this Github repo

  1. Fork repo on Github
  2. Clone repo to local computer
  3. Fix the spelling error above and save the file
  4. Add a new file called notes.md
  5. Add or stage your changes
  6. Commit your work
  7. Push to github

About

Some notes on Git and Github

Topics

Resources

Stars

Watchers

Forks

Contributors