Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Lesson 01 Command line tutorial

ydara edited this page · 22 revisions

Borrowed and modified from planspace.org

You should know about ls, pwd, mv, rm, mkdir, rmdir.

The Manual

Documentation for all commands can be found using the man command (for *man*ual). You can even get the documentation for man itself.

yasins-mbp:~ yasin$ man man
  • space to scroll down;
  • b to scroll up;
  • q to quit.

Making and Switching to Directories

Make a directory where you can do some work:

yasins-mbp:Data Science yasin$ mkdir GADS1403
yasins-mbp:Data Science yasin$ cd GAS1403

Fetching URLs

Download the jobs data:

yasins-mbp:GADS1403 yasin$ curl -L 'https://github.com/adparker/GADSLA_1403/raw/master/src/lesson01/jobs.tsv' -o jobs.tsv
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   138  100   138    0     0    202      0 --:--:-- --:--:-- --:--:--   202
100 77215  100 77215    0     0  52713      0  0:00:01  0:00:01 --:--:--  112k

Show entire contents of a file

Show the entire contents of a file using cat:

yasins-mbp:GADS1403 yasins$ cat jobs.tsv
OTR and Local Truck Driver Jobs! Up to $7,500 Sign-On Bonuses!  Warren  MI  US  42.47754    -83.0277
Great OTR and Local Truck Drivers! Benefits, Home Time and Bonus Pay!   Taylorsville    UT  US  40.66772    -111.93883
Loan Production Assistant   Minneapolis MN  US  44.97997    -93.26384
►Mortgage / Real Estate Professionals - Increase Your Business    Knoxville   TN  US  35.96064    -83.92074
Verizon Sales Representative    East Brunswick  NJ  US  40.42788    -74.41598
...

Paging Through a File

Wow, that was a lot of data. Let's use a pager instead:

yasins-mbp:GADS1403 yasin$ less jobs.tsv

Your entire buffer should be filled with lines, but you'll be at the top of the file. Here are couple of quick commands while you're using less:

  • q to quit;
  • return to scroll down one line;
  • space bar to page down;
  • b to page up;
  • G to jump to the end of the file;
  • /pattern to search forward for a line containing PATTERN; -- n search for the next line containing PATTERN; -- N search in reverse containing PATTERN;
  • ?pattern starts the search in reverse for PATTERN;
  • &pattern show only lines containing PATTERN.

Word count

Let's see how big this file is:

yasins-mbp:GADS1403 yasin$ wc jobs.tsv
    1000   11140   77215 jobs.tsv

That is, 1000 lines, 11140 words, and 77215 bytes. If you just want the number of lines, you can do this:

yasins-mbp:GADS1403 yasin$ wc -l jobs.tsv
    1000 jobs.tsv

First Part and Last Part of a File

This gets you the first 5 lines of the file:

yasins-mbp:GADS1403 yasin$ head -n 5 jobs.tsv
OTR and Local Truck Driver Jobs! Up to $7,500 Sign-On Bonuses!  Warren  MI  US  42.47754    -83.0277
Great OTR and Local Truck Drivers! Benefits, Home Time and Bonus Pay!   Taylorsville    UT  US  40.66772    -111.93883
Loan Production Assistant   Minneapolis MN  US  44.97997    -93.26384
►Mortgage / Real Estate Professionals - Increase Your Business    Knoxville   TN  US  35.96064    -83.92074
Verizon Sales Representative    East Brunswick  NJ  US  40.42788    -74.41598

This gets you the last 5 lines of the file:

yasins-mbp:GADS1403 yasin$ tail -n 5 jobs.tsv
Life insurance-Sales professionals wanted   Gages Lake  IL  US  42.35169    -87.98258
Truck Driver - Regional Dedicated Route - Avg 60-65k per year!  Olympia WA  US  47.03787    -122.9007
CDL Truck Driver - Team Division - Excellent Pay & Benefits!    Pembroke Pines  FL  US  26.00315    -80.22394
Local Class A & C Truck Driver- Texas City TX   Texas City  TX  US  29.38384    -94.9027
Leasing Associate For Greentree Apartments - Starts in March    Huntington  WV  US  38.41925    -82.44515

Finding differences

Download jobs2.tsv, and jobs3.tsv. They are similar, but differ by a few lines (only showing the first part of the output...)

yasins-mbp:GADS1403 yasin$ diff -u jobs2.tsv jobs3.tsv
--- jobs2.tsv   2014-03-04 17:38:59.000000000 -0800
+++ jobs3.tsv   2014-03-04 17:38:51.000000000 -0800
@@ -1,15 +1,21 @@
+Great OTR and Local Truck Drivers! Benefits, Home Time and Bonus Pay!  Taylorsville    UT  US  40.66772    -111.93883
 Loan Production Assistant  Minneapolis MN  US  44.97997    -93.26384
 ►Mortgage / Real Estate Professionals - Increase Your Business   Knoxville   TN  US  35.96064    -83.92074
+Verizon Sales Representative   East Brunswick  NJ  US  40.42788    -74.41598
 Dosimetrist- Therapeutic Radiologic    San Antonio TX  US  29.42412    -98.49363
 NO TELEMARKETING - Team Lead - Sales & Marketing   Miami   FL  US  25.77427    -80.19366
 CDL Truck Driver - Regional Home Weekly or Daily - Excellent Pay   Waukegan    IL  US  42.36363    -87.84479
+FASHION / COSMETICS: Marketing. Advertising. Sales.    Pleasanton  CA  US  37.66243    -121.87468
 Target Mobile Wirless Team Member  Dearborn Heights    MI  US  42.33698    -83.27326
 Highway /Bridge Design Engineer    Memphis TN  US  35.14953    -90.04898
 Business Intelligence Manager – COGNOS, ETL, EIM - Houston, TX   Houston TX  US  29.76328    -95.36327
+Java/JavaScript Developer NYC 01062014 New York    NY  US  40.71427    -74.00597
 CDL A & B Truck Drivers Needed Bronx   NY  US  40.8373 -73.886
+Life Insurance Agent   Nashville   TN  US  36.16589    -86.78444
 Business Development Professional  Newark  DE  US  39.68372    -75.74966
 CDL Truck Driver - Flex Home Time - Immediate Opportunities    Hattiesburg MS  US  31.32712    -89.29034
 SIGN ON BONUSES! TMC Transportation Hiring CDL OTR Truck Drivers   Portsmouth  VA  US  36.83543    -76.29827
+Arthritis Research Studies – Up to $500 Compensation Fargo   ND  US  46.87719    -96.7898
 Owner Operator - Class A - Excellent Pay - Great Home Time!    Chicopee    MA  US  42.17256    -72.59491
 WE NEED DRIVERS! CDL CLASS A DRIVER NEEDED!    Woodbine    MD  US  39.335671   -77.06364
 Community Coordinator-International Students   Fordyce NE  US  42.750269   -97.3801
...

Using pipes

Let's save the first 10 lines to a new file. We'll introduce "pipes" here:

yasins-mbp:GADS1403 yasin$ head -n 4 jobs.tsv > jobstmp.tsv
yasins-mbp:GADS1403 yasin$ cat jobstmp.tsv
OTR and Local Truck Driver Jobs! Up to $7,500 Sign-On Bonuses!  Warren  MI  US  42.47754    -83.0277
Great OTR and Local Truck Drivers! Benefits, Home Time and Bonus Pay!   Taylorsville    UT  US  40.66772    -111.93883
Loan Production Assistant   Minneapolis MN  US  44.97997    -93.26384
►Mortgage / Real Estate Professionals - Increase Your Business    Knoxville   TN  US  35.96064    -83.92074
yasins-mbp:GADS1403 yasin$

And then let's append the last 4 lines to the same file:

yasins-mbp:GADS1403 yasin$ tail -n 4 jobs.tsv >> jobstmp.tsv
yasins-mbp:GADS1403 yasin$ cat jobstmp.tsv
OTR and Local Truck Driver Jobs! Up to $7,500 Sign-On Bonuses!  Warren  MI  US  42.47754    -83.0277
Great OTR and Local Truck Drivers! Benefits, Home Time and Bonus Pay!   Taylorsville    UT  US  40.66772    -111.93883
Loan Production Assistant   Minneapolis MN  US  44.97997    -93.26384
►Mortgage / Real Estate Professionals - Increase Your Business    Knoxville   TN  US  35.96064    -83.92074
Truck Driver - Regional Dedicated Route - Avg 60-65k per year!  Olympia WA  US  47.03787    -122.9007
CDL Truck Driver - Team Division - Excellent Pay & Benefits!    Pembroke Pines  FL  US  26.00315    -80.22394
Local Class A & C Truck Driver- Texas City TX   Texas City  TX  US  29.38384    -94.9027
Leasing Associate For Greentree Apartments - Starts in March    Huntington  WV  US  38.41925    -82.44515
yasins-mbp:GADS1403 yasin$

Cutting things up

Use cut to chop up the file vertically along the tab character. This is how you pull out the 2nd column, which contains cities:

yasins-mbp:GADS1403 yasin$ cut -f 2 jobstmp.tsv
Warren
Taylorsville
Minneapolis
Knoxville
Olympia
Pembroke Pines
Texas City
Huntington

Sorting

What if we want the cities sorted?

yasins-mbp:GADS1403 yasin$ cut -f 2 jobstmp.tsv | sort
Huntington
Knoxville
Minneapolis
Olympia
Pembroke Pines
Taylorsville
Texas City
Warren

Counting

What about the top ten most popular cities?

yasins-mbp:GADS1403 yasin$ cut -f 2 jobs.tsv | sort | uniq -c | sort -rn | head -n 10
  26 New York
  16 Houston
  13 Cincinnati
  13 Chicago
  12 Dallas
  10 Washington
  10 Raleigh
  10 Nashville
  10 Columbus
   9 Los Angeles

Simple translation

We need to capitalize all the cities:

yasins-mbp:GADS1403 yasin$ cut -f 2 jobstmp.tsv | tr [a-z] [A-Z]
WARREN
TAYLORSVILLE
MINNEAPOLIS
KNOXVILLE
OLYMPIA
PEMBROKE PINES
TEXAS CITY
HUNTINGTON

Pasting Columns Back Together

andrews-mbp:GADS1403 andrew$ paste -d '\t' caps.txt jobstmp.tsv
WARREN  OTR and Local Truck Driver Jobs! Up to $7,500 Sign-On Bonuses!  Warren  MI  US  42.47754    -83.0277
TAYLORSVILLE    Great OTR and Local Truck Drivers! Benefits, Home Time and Bonus Pay!   Taylorsville    UT  US  40.66772    -111.93883
MINNEAPOLIS Loan Production Assistant   Minneapolis MN  US  44.97997    -93.26384
KNOXVILLE   ►Mortgage / Real Estate Professionals - Increase Your Business    Knoxville   TN  US  35.96064    -83.92074
OLYMPIA Truck Driver - Regional Dedicated Route - Avg 60-65k per year!  Olympia WA  US  47.03787    -122.9007
PEMBROKE PINES  CDL Truck Driver - Team Division - Excellent Pay & Benefits!    Pembroke Pines  FL  US  26.00315    -80.22394
TEXAS CITY  Local Class A & C Truck Driver- Texas City TX   Texas City  TX  US  29.38384    -94.9027
HUNTINGTON  Leasing Associate For Greentree Apartments - Starts in March    Huntington  WV  US  38.41925    -82.44515

Challenge:

Write a python program to print out a table (or anything neatly formatted) of the top twenty most frequently used words in the whispers.csv file that you will find in DS-LA-03/src/lesson01 on this repository.

Important: Include your full name at the top of the file as a comment.

Also Important: Do your best to adhere to good coding standards, and write neatly formatted code. Comment where appropriate.

Submission: Submit your assignment by attaching it to a card under "Homework" on our course Trello Board https://trello.com/b/2gZEFsC8/ds-la-04 Please make the title of your card: "(Your Name): Assignment 1"

Something went wrong with that request. Please try again.