Skip to content

Skills Learned from VIP Class

Rodd Talebi edited this page Apr 9, 2021 · 2 revisions

I thought I'd put together a cheat sheet of things that I think undergraduate students should mention as skills they have learned from this project. So this should give you insight as to what kind of things companies are looking for, and the type of things you will be doing while working on ezCGP codebase.

But at the same time, I want to stress that you shouldn't feel like you need to be an expert in all of these things; from my experience, it shows great maturity to say, "well I can't say that I am at the level where I can readily teach others about this tool, but I have used it on 'X project' and should be able to start using it on Day 1 of the job with minimal overhead to remind myself of the basics". Rule of thumb: give the non-technical recruiter binary answers where "Yes" is "Ya I've heard of this tool and vaguely understand what it does", and "No" is "Never heard of it"; and be as honest and open as possible with the employee who is hiring you to work with them...don't sell yourself short ("I'm not an expert, but I am confident that I can learn it quickly") but be honest and LIKEABLE!

Python

Conda

  • What is a conda environment, and why is it useful to use one?
  • How do you use one?

Versions

Different versions of python or modules can give different results and could have different capabilities available to you. It is important to know how to find which version you have, and to make sure you are reading the documentation for that version.

  • Can you find out which version of python you are using?

In the terminal call $ python --version

Linux users can see where their python executable is stored which should give the version $ which python

After an interactive python environment/shell has been opened you can check with print(sys.version), assuming sys was already imported

Read Documentation

Say I give you a python module you've never heard of and a method or class from it I think you should use...say something like Augmentor.Pipeline.Pipeline

  • Can you find the documentation for that method and understand what it is supposed to do?
  • If you don't find the documentation adequate, are you comfortable going to the source code (click the source link in the documentation) and reading it to see what it actually does?

Also, did you know that you can call help(some_method) in python, and it will (if available) open the documentation for that method or module. Similarly, you can call dir(some_class) and it will list all the available class attributes and methods. Example, say it has been a while since I've used pandas and I just want a quick refresher on how to join to dataframes.

import pandas as pd
df = pd.DataFrame() #make an empty DataFrame
dir(df) # I can now see that 'join' is listed as a method of this class
help(df.join) # or help(pd.DataFrame.join) ...similarly help(df) or help(pd.DataFrame) and then search for join

Debugging

This one is sponsored by Dr. Jason Zutty. Throw import pdb; pdb.set_trace() at any point in your code so when you run your script, python will execute the code up to that point and will make it 'interact-able' so you can start investigating the code 'by hand'. As a warning, be sure that you don't have any variable names shared with the pdb Debugger Commands as the commands will overwrite any existing variable; for example...

import numpy as np
import pdb

output[i,j] = np.empty((row_count, col_count))
for i in range(row_count):
    for j in range(col_count):
        output[i,j] = do_thing(i,j)
        pdb.set_trace()

...the variable j will no longer tell you where you are in the second for loop, but rather will be this jump command.

Misc

  • Object-Oriented Programming: Abstract Base Classes and Inheritance
  • logging module
  • decorators
  • if __name__ == "__main__"
  • sys.path and importing local python scripts from another directory

Project Management

Unless you join a tiny startup, you'll be working with a group of other professionals, and they will want to see that you can seamlessly adopt the tools they use for managing the group.

This is where it is important to be aware of the many tools out there, but not necessarily have the experience with those tools because so many of them offer the same service. "Oh I have experience with Kanban boards but not with Trello; we use a GitHub extension called ZenHub to manage and track issues; I'm sure I can start using Trello with very limited overhead." ...or... "I haven't actually had the opportunity to use the Atlassian suite, but I know what it is and have heard of the tools it offers: Jira, Confluence, Crucible. Can't tell you exactly how each works, but I'm confident that I have the experience to pick it up really quickly; I know how important these things are to staying organized, especially on big projects."

By the way, if not mentioned in the interview, this could be a good follow-up question to ask... "How big of a team would I be working on? how is it organized? who manages it? how much communication is there going up the corporate chain vs down the chain? what tools would I be asked to use?" If asked the right way, it plants the idea that you are going to be working there #inception, but it is also helpful for you to filter out companies who have no idea what they are doing.

Here is a shortlist of products to be aware of and our recommendation to what level:

  • Agile is a concept or a philosophy, where Kanban and Scrum are different strategies to implement that philosophy. It is important to know those 3 words and their distinction, but only at a high level. Here is a good source
  • Then there are different software companies that design tools that use some of those strategies: Trello (Kanban), ZenHub (Kanban), Jira (Scrum; part of Atlassian suite), Asana, Notion (you'll be less likely to find this at a hardcore tech company and it is more of a place to share work and notes but it is getting more and more popular). It is good to know these by name but don't even bother sinking time to understand each one.
  • git is a software for version control on your local computer, whereas GitHub, GitLab, and BitBucket offer services to host the version control online, and more recently, have been offering their own services for project management type stuff. It is really important to know that git is not a tool from GitHub but rather that something like GitHub leverages the existing software called git. Also, just know that there is another version control system called svn; it is super outdated but maybe you'll come across it at some ancient company.

As a software developer, code management is going to have to be a major skill. Check out the wiki page Hemang wrote on GitHub Process Flow

Cloud + Compute Clusters

Don't be a fool; there is no cloud; everything is a computer! Whether you are online browsing the web, grabbing code from a repo hosted on a site, or ssh-ed onto a remote node, you are using your computer to talk to another computer!

Wait, some vocab; mostly gonna steal from Google on this:

  • server: "a computer or computer program which manages access to a centralized resource or service in a network."
  • cluster: basically a connected system of a bunch of 'computers'; and when I say 'computers', I loosely mean something with an operating system with it's own set of cpu's or gpu's. A cluster has a bunch of these computers as separate ways to use the cluster BUT generally has a single shared set of storage spaces so that no matter which computer you log into, you always have access to the same files.
  • node: "a piece of equipment, such as a PC or peripheral, attached to a network." ...I use it to refer to a 'computer' that is part of a larger cluster of computers. So instead of saying "I ssh'd into this computer part of the cluster", I would say "I ssh'd into this node part of the cluster".
  • seat: I doubt this is very common, but at GTRI it is very common. At our work we have a computer lab which is essentially a cluster of computers but where each 'node' has it's own monitor and it is more a physical 'computer' I can 'sit' down in front of. So each seat has it's own set of cpu/gpu BUT it also shares a file storage system so I can access my files from any seat.
  • ssh: eli5, basically it's an accepted method to get access to a computer from another computer assuming that they are somehow "connected" like through the internet
  • vpn: I can't say that I'm the best person to do a eli5 on this but I'll give it my best shot...in my head, it is like ssh where you are requesting access to some network or computer, so any computer to computer communication (like using the internet), will first be sent from your computer to the vpn-d network and then to the destination computer.

Here are 3 major providers of some 'cloud service': Google Cloud, Amazon Web Services, Microsoft Azure...basically any massive tech company that already had huge servers and computer networks scattered around the globe. In short, they provide access to their computers for some small cost. Since computers can do a crap ton of stuff, they have a crap ton of services as part of their cloud suite: computers for computing things (big ram, cpu, gpu usage), computers for storing/hosting files or websites or apps or databases, etc. Each of those have their own names with their own documentation and it can be EXTREMELY overwhelming so all you need to be is vaguely familiar with what they offer at a minimum.