# Exercise 2 - shell, pipes, and csvkit

In this exercise, we'll review a few shell commands and explore working with pipes and csvkit.

You will need to fill in commands whenever prompted.  Please replace the text with your solution.

Remember to submit your completed `.ipynb` file to Blackboard and to add/commit it to your Git repository and push it to GitHub.


## Part 1 - shell commands, redirection, and pipes

### Basic shell commands and redirection

Create a directory called `part1` using `mkdir`.

In [1]:
!mkdir part1

Rename `part1` to `partone` using `mv`.

In [2]:
!mv part1 partone
!ls

30  exercise-02.ipynb  partone	siddhartha.txt	Untitled.ipynb


Create a file named `filelist.txt` using the output from `ls` and the output redirector `>`.

In [5]:
!ls > filelist.txt

In [6]:
!cat filelist.txt

30
exercise-02.ipynb
filelist.txt
partone
siddhartha.txt
Untitled.ipynb


Append to `filelist.txt` using the output appending redirector `>>`.  Note the difference between the single `>` and double `>>`.

In [7]:
!ls >> filelist.txt
!cat filelist.txt

30
exercise-02.ipynb
filelist.txt
partone
siddhartha.txt
Untitled.ipynb
30
exercise-02.ipynb
filelist.txt
partone
siddhartha.txt
Untitled.ipynb


In [8]:
!ls > filelist.txt
!cat filelist.txt

30
exercise-02.ipynb
filelist.txt
partone
siddhartha.txt
Untitled.ipynb


What's the difference between `>` and `>>`?


In [11]:
# The difference between > and >>
print("Operator > directs the output of a command into a file. If the file exists, its contents will be overwritten. If it does not exist, a new file will be created. In the example above a new file called filelist.txt was created. Operator >> appends an output of a command to an existing file. If the file does not exist it creates a new one, just like in the case with > operator. In the example above the output of !ls command was appended to an existing filelist.txt file")
raise NotImplementedError()

Operator > directs the output of a command into a file. If the file exists, its contents will be overwritten. If it does not exist, a new file will be created. In the example above a new file called filelist.txt was created. Operator >> appends an output of a command to an existing file. If the file does not exist it creates a new one, just like in the case with > operator. In the example above the output of !ls command was appended to an existing filelist.txt file


NotImplementedError: 

### Your turn

Complete the following tasks in the cells provided.  All the tests in the testing cells (with the `assert` statements) below should pass without error - be sure to execute those as well, and if you see errors, fix your answer and try testing again until there are no errors.

Create a directory called `mydirectory`.

In [5]:
!mkdir mydirectory
raise NotImplementedError()

NotImplementedError: 

In [6]:
import os
assert 'mydirectory' in os.listdir('.')

Using `ls` and output redirection, create a file called `myfiles.txt` in the directory `mydirectory` that contains the list of files in the current directory.

In [7]:
!ls > myfiles.txt
!mv myfiles.txt ./mydirectory/
raise NotImplementedError()

NotImplementedError: 

In [8]:
assert 'myfiles.txt' in os.listdir('mydirectory')

In [9]:
myfiles = open('mydirectory/myfiles.txt').read()
assert 'exercise-02.ipynb' in myfiles

Clean up the directory you just created by removing its contents (the file you created) using `rm`.

In [10]:
!rm ./mydirectory/*.*

In [11]:
assert 'myfiles.txt' not in os.listdir('mydirectory')

Now remove the directory itself using `rmdir`.

In [12]:
!rmdir mydirectory
raise NotImplementedError()

NotImplementedError: 

In [13]:
assert 'mydirectory' not in os.listdir('.')

### Filters and pipes

Let's look at something a little more interesting.  Download the text of Herman Hesse's *Siddhartha* from [Project Gutenberg](http://www.gutenberg.org/):

In [14]:
!wget https://www.gutenberg.org/cache/epub/2500/pg2500.txt

--2016-09-08 20:58:18--  https://www.gutenberg.org/cache/epub/2500/pg2500.txt
Resolving www.gutenberg.org (www.gutenberg.org)... 152.19.134.47, 2610:28:3090:3000:0:bad:cafe:47
Connecting to www.gutenberg.org (www.gutenberg.org)|152.19.134.47|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 241176 (236K) [text/plain]
Saving to: ‘pg2500.txt’


2016-09-08 20:58:19 (1.13 MB/s) - ‘pg2500.txt’ saved [241176/241176]



*Note*: sometimes Project Gutenberg restricts access.  If that creates an error for you, you should be able to `wget` the same file from our class repository on GitHub at the url https://github.com/gwsb-istm-6212-fall-2016/syllabus-and-schedule/raw/master/exercises/pg2500.txt.

However you get the file, let's rename it to something easier to remember.

In [15]:
!mv pg2500.txt siddhartha.txt

`head` and `tail` are very useful.  They let you take a quick peek at the start and end of files.

In [16]:
!head siddhartha.txt

﻿The Project Gutenberg EBook of Siddhartha, by Herman Hesse

This eBook is for the use of anyone anywhere at no cost and with
almost no restrictions whatsoever.  You may copy it, give it away or
re-use it under the terms of the Project Gutenberg License included
with this eBook or online at www.gutenberg.org


Title: Siddhartha



In [17]:
!tail siddhartha.txt



Most people start at our Web site which has the main PG search facility:

     http://www.gutenberg.org

This Web site includes information about Project Gutenberg-tm,
including how to make donations to the Project Gutenberg Literary
Archive Foundation, how to help produce our new eBooks, and how to
subscribe to our email newsletter to hear about new eBooks.


`grep` is one of the most useful filters.  It lets you search for and match lines that contain specific expressions.  For example, to find mentions of "copyright":

In [18]:
!grep copyright siddhartha.txt

one owns a United States copyright in these works, so the Foundation
permission and without paying copyright royalties.  Special rules,
(trademark/copyright) agreement.  If you do not agree to abide by all
or PGLAF), owns a compilation copyright in the collection of Project
1.D.  The copyright laws of the place where you are located also govern
the copyright status of any work in any country outside the United
posted with permission of the copyright holder), the work can be copied
with the permission of the copyright holder, your use and distribution
terms imposed by the copyright holder.  Additional terms will be linked
permission of the copyright holder found at the beginning of this work.
effort to identify, do copyright research on, transcribe and proofread
corrupt data, transcription errors, a copyright or other intellectual
unless a copyright notice is included.  Thus, we do not necessarily


Notice anything that those lines have in common?

Let's add a little more information by including the `-n` flag to add matching line numbers.

In [19]:
!grep -n copyright siddhartha.txt

3973:one owns a United States copyright in these works, so the Foundation
3975:permission and without paying copyright royalties.  Special rules,
4010:(trademark/copyright) agreement.  If you do not agree to abide by all
4029:or PGLAF), owns a compilation copyright in the collection of Project
4044:1.D.  The copyright laws of the place where you are located also govern
4051:the copyright status of any work in any country outside the United
4070:posted with permission of the copyright holder), the work can be copied
4080:with the permission of the copyright holder, your use and distribution
4082:terms imposed by the copyright holder.  Additional terms will be linked
4084:permission of the copyright holder found at the beginning of this work.
4155:effort to identify, do copyright research on, transcribe and proofread
4160:corrupt data, transcription errors, a copyright or other intellectual
4308:unless a copyright notice is included.  Thus, we do not necessarily

Now let's look for any mention of "river".  This will match a lot of text, so we'll just take the first 10 matching lines by *piping* the output from `grep` into `head`.

In [20]:
!grep -n river siddhartha.txt | head

52:In the shade of the house, in the sunshine of the riverbank near the
56:tanned his light shoulders by the banks of the river when bathing,
107:came into his mind, flowing from the water of the river, sparkling from
329:the river and to perform the first ablution."
460:the same short numbing is what the driver of an ox-cart finds in the
469:is no driver of an ox-cart and a Samana is no drunkard.  It's true that
580:submerged in the murky river of physical forms.  Many wonderful and
1062:and the river flowed, the forest and the mountains were rigid, all of it
1065:all this yellow and blue, river and forest, entered Siddhartha for the
1069:who scorns diversity, who seeks unity.  Blue was blue, river was river,


How many lines contain "river"?  We can count by piping into the word count tool `wc`.

In [21]:
!grep river siddhartha.txt | wc

    109    1365    7443


That's 109 matching lines, containing 1365 words and 7979 characters.  If you just wanted the lines by themselves, use `wc -l`:

In [22]:
!grep river siddhartha.txt | wc -l

109


What if we want to match both upper- and lower-case text?  Use `grep -i`:

In [23]:
!grep time siddhartha.txt | wc -l

166


In [24]:
!grep -i time siddhartha.txt | wc -l

167


### Your turn

How many lines in *Siddhartha* contain "other" (just lower-case)?  Start by using `grep` to extract lines that match the word "other" in `siddhartha.txt` and redirecting it to a file called `other-lines.txt`.

In [25]:
!grep -n other siddhartha.txt > other-lines.txt
raise NotImplementedError()

NotImplementedError: 

In [26]:
%sc h_other = head -1 other-lines.txt
assert "other" in h_other

In [27]:
%sc t_other = tail -1 other-lines.txt
assert "other" in t_other

Now count up the lines in the file you created using `wc`.

In [34]:
linescount = !wc -l other-lines.txt
raise NotImplementedError()

NotImplementedError: 

In [35]:
assert "127" in linescount

AssertionError: 

Your answer should be 127!

### Counting words with `grep`

By piping commands together we can do a lot of powerful things right at the command line.  Let's create a count of the most commonly occurring words in *Siddhartha*.  To do that, we could write a Python script that just counts words, but with the command line shell tools we only need to put a proper pipeline together and we can often accomplish tasks like this in one line.

First we need to split up the text lines into a word per line.  There are `grep` flags for that!

In [46]:
!cat siddhartha.txt | grep -oE '\w{{2,}}' | head -10

The
Project
Gutenberg
EBook
of
Siddhartha
by
Herman
Hesse
This
grep: write error: Broken pipe
cat: write error: Broken pipe


Now we need to sort them and count the unique tokens.  `sort` solves the first problem.

In [47]:
!cat siddhartha.txt | grep -oE '\w{{2,}}' | sort | head -10

000
1500
1887
20
2001
2008
2011
2013
23
2500
sort: write failed: 'standard output': Broken pipe
sort: write error


And `uniq -c` solves the second problem.

In [48]:
!cat siddhartha.txt | grep -oE '\w{{2,}}' | sort | uniq -c | head -25

      1 000
      1 1500
      1 1887
      1 20
      1 2001
      1 2008
      1 2011
      1 2013
      1 23
      4 2500
      1 30
      1 4557
      1 50
      2 501
      1 596
      1 60
      1 6221541
      1 64
      1 801
      1 809
      1 84116
      2 90
      1 99712
      2 abandon
      2 abandoned
uniq: write error: Broken pipe


But there's a catch... do you see it?

We need to convert all the words down into lower case so that we are correctly counting unique words.  There's another command, `tr`, for that.

In [51]:
!cat siddhartha.txt | grep -oE '\w{{2,}}' | tr '[:upper:]' '[:lower:]' | sort | uniq -c | head -25

      1 000
      1 1500
      1 1887
      1 20
      1 2001
      1 2008
      1 2011
      1 2013
      1 23
      4 2500
      1 30
      1 4557
      1 50
      2 501
      1 596
      1 60
      1 6221541
      1 64
      1 801
      1 809
      1 84116
      2 90
      1 99712
      2 abandon
      2 abandoned
uniq: write error: Broken pipe


...and if we want to know only the top 10 words in Siddhartha, we need to sort the output.

In [52]:
!cat siddhartha.txt | grep -oE '\w{{2,}}' | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort | head -10

      1 000
    101 do
    106 more
    108 any
    108 so
     10 access
     10 agree
     10 arms
     10 ask
     10 ate
sort: write failed: 'standard output': Broken pipe
sort: write error


But that sorts by character, not number.  Fortunately, `sort -n` does what we want.

In [54]:
!cat siddhartha.txt | grep -oE '\w{{2,}}' | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -n | head -10

      1 000
      1 1500
      1 1887
      1 20
      1 2001
      1 2008
      1 2011
      1 2013
      1 23
      1 30
sort: write failed: 'standard output': Broken pipe
sort: write error


But that's the wrong end of the list!  Two ways to fix that:  (a) use `tail` instead of `head`; (b) use `sort -rn`, which will sort in reverse order.  Let's try the latter.

In [55]:
!cat siddhartha.txt | grep -oE '\w{{2,}}' | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -rn | head -10

   2221 the
   1434 and
   1225 to
   1106 of
    960 he
    708 his
    686 in
    540 you
    524 had
    512 was
sort: write failed: 'standard output': Broken pipe
sort: write error


### Your turn

Download *Alice in Wonderland* from http://www.gutenberg.org/cache/epub/11/pg11.txt (or https://github.com/gwsb-istm-6212-fall-2016/syllabus-and-schedule/raw/master/exercises/pg11.txt if the first url doesn't work).

In [57]:
!wget http://www.gutenberg.org/cache/epub/11/pg11.txt
raise NotImplementedError()

--2016-09-08 22:00:52--  http://www.gutenberg.org/cache/epub/11/pg11.txt
Resolving www.gutenberg.org (www.gutenberg.org)... 152.19.134.47, 2610:28:3090:3000:0:bad:cafe:47
Connecting to www.gutenberg.org (www.gutenberg.org)|152.19.134.47|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 167518 (164K) [text/plain]
Saving to: ‘pg11.txt.1’


2016-09-08 22:00:52 (39.8 MB/s) - ‘pg11.txt.1’ saved [167518/167518]



NotImplementedError: 

In [59]:
assert 'pg11.txt' in os.listdir('.')

Now rename `pg11.txt` to `alice.txt`.

In [60]:
!mv pg11.txt alice.txt
raise NotImplementedError()

NotImplementedError: 

In [62]:
assert 'alice.txt' in os.listdir('.')

Take a look at the next cell.  Will it find the top 25 unique words in *Alice in Wonderland* successfully?

In [64]:
!cat alice.txt | grep -oE '\w{{2,}}' | sort | uniq -c | head -25

      1 000
      4 11
      1 1500
      1 1887
      1 1994
      2 20
      1 2001
      1 2008
      1 2011
      1 25
      1 30
      1 4557
      1 50
      2 501
      1 596
      1 60
      1 6221541
      1 64
      1 801
      1 809
      1 84116
      2 90
      1 99712
      2 abide
      1 able
uniq: write error: Broken pipe


Describe what needs to be done to the previous cell to get it to work correctly.  Describe it using words, explaining the issues, rather than using shell commands!

In [75]:
print ("Following the examples in the previous section, all the words should be converted from upper case to lower case so we are counting the unique words correctly and do not consider upper case words different from the same lower case words. This is done by adding command tr '[:upper:]' '[:lower:]'.Command sort sorts the words by character, while we need to be sorted by number. In order to do that, command sort -n is added. Words are not sorted in the desired order. We could either use tail -25 instead of head -25 or sort -rn instead of sort-n.")



Following the examples in the previous section, all the words should be converted from upper case to lower case so we are counting the unique words correctly and do not consider upper case words different from the same lower case words. This is done by adding command tr '[:upper:]' '[:lower:]'.Command sort sorts the words by character, while we need to be sorted by number. In order to do that, command sort -n is added. Words are not sorted in the desired order. We could either use tail -25 instead of head -25 or sort -rn instead of sort-n.


Okay, now implement your solution using shell commands with a pipeline.

In [76]:
!cat alice.txt | grep -oE '\w{{2,}}' | tr '[:upper:]' '[:lower:]'| sort | uniq -c |sort -rn| head -25
raise NotImplementedError()

   1818 the
    940 and
    809 to
    631 of
    610 it
    553 she
    481 you
    462 said
    431 in
    403 alice
    358 was
    330 that
    274 as
    248 her
    228 with
    227 at
    204 on
    200 all
    181 this
    179 for
    178 had
    175 but
    167 be
    166 not
    155 they
sort: write failed: 'standard output': Broken pipe
sort: write error


NotImplementedError: 

## Part 2 - csvkit basics

Let's look at some CSV data using csvkit.  Download the 2015 fourth quarter trip dataset from [Capital Bikeshare's trip history data](https://www.capitalbikeshare.com/trip-history-data):

In [77]:
!wget https://s3.amazonaws.com/capitalbikeshare-data/2015-Q4-cabi-trip-history-data.zip

--2016-09-08 22:27:52--  https://s3.amazonaws.com/capitalbikeshare-data/2015-Q4-cabi-trip-history-data.zip
Resolving s3.amazonaws.com (s3.amazonaws.com)... 54.231.72.58
Connecting to s3.amazonaws.com (s3.amazonaws.com)|54.231.72.58|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14308794 (14M) [application/zip]
Saving to: ‘2015-Q4-cabi-trip-history-data.zip’


2016-09-08 22:27:55 (4.79 MB/s) - ‘2015-Q4-cabi-trip-history-data.zip’ saved [14308794/14308794]



Let's unzip it, rename it to something short, and take a look.

In [78]:
!unzip 2015-Q4-cabi-trip-history-data.zip

Archive:  2015-Q4-cabi-trip-history-data.zip
  inflating: 2015-Q4-Trips-History-Data.csv  


In [79]:
!mv 2015-Q4-Trips-History-Data.csv q4.csv

In [80]:
!head q4.csv

Duration (ms),Start date,End date,Start station number,Start station,End station number,End station,Bike #,Member type
166050,10/1/2015 0:01,10/1/2015 0:04,31602,Park Rd & Holmead Pl NW,31105,14th & Harvard St NW,W21109,Registered
379172,10/1/2015 0:01,10/1/2015 0:07,31314,34th & Water St NW,31237,25th St & Pennsylvania Ave NW,W20603,Registered
696038,10/1/2015 0:01,10/1/2015 0:13,31214,17th & Corcoran St NW,31214,17th & Corcoran St NW,W01233,Registered
219423,10/1/2015 0:02,10/1/2015 0:06,31104,Adams Mill & Columbia Rd NW,31121,Calvert St & Woodley Pl NW,W00218,Registered
253230,10/1/2015 0:03,10/1/2015 0:07,31102,11th & Kenyon St NW,31102,11th & Kenyon St NW,W21612,Registered
655251,10/1/2015 0:03,10/1/2015 0:14,31242,18th St & Pennsylvania Ave NW,31114,18th St & Wyoming Ave NW,W22093,Registered
309212,10/1/2015 0:06,10/1/2015 0:11,31280,11th & S St NW,31278,18th & R St NW,W00231,Registered
776195,10/1/2015 0:07,10/1/2015 0:20,31226,34th St & Wisconsin Ave NW,31214,17

csvkit gives us great tools for examining and working with CSV data.  We start by looking at the columns:

In [82]:
!csvcut -n q4.csv

  1: Duration (ms)
  2: Start date
  3: End date
  4: Start station number
  5: Start station
  6: End station number
  7: End station
  8: Bike #
  9: Member type


We can also extract just a few columns with `csvcut`:

In [83]:
!csvcut -c1,5,7 q4.csv | head -10

Duration (ms),Start station,End station
166050,Park Rd & Holmead Pl NW,14th & Harvard St NW
379172,34th & Water St NW,25th St & Pennsylvania Ave NW
696038,17th & Corcoran St NW,17th & Corcoran St NW
219423,Adams Mill & Columbia Rd NW,Calvert St & Woodley Pl NW
253230,11th & Kenyon St NW,11th & Kenyon St NW
655251,18th St & Pennsylvania Ave NW,18th St & Wyoming Ave NW
309212,11th & S St NW,18th & R St NW
776195,34th St & Wisconsin Ave NW,17th & Corcoran St NW
151604,Wilson Blvd & N Uhle St,N Veitch  & 20th St N


...and make it look better with `csvlook`:

In [84]:
!csvcut -c1,5,7 q4.csv | head -10 | csvlook

|----------------+-------------------------------+--------------------------------|
|  Duration (ms) | Start station                 | End station                    |
|----------------+-------------------------------+--------------------------------|
|  166050        | Park Rd & Holmead Pl NW       | 14th & Harvard St NW           |
|  379172        | 34th & Water St NW            | 25th St & Pennsylvania Ave NW  |
|  696038        | 17th & Corcoran St NW         | 17th & Corcoran St NW          |
|  219423        | Adams Mill & Columbia Rd NW   | Calvert St & Woodley Pl NW     |
|  253230        | 11th & Kenyon St NW           | 11th & Kenyon St NW            |
|  655251        | 18th St & Pennsylvania Ave NW | 18th St & Wyoming Ave NW       |
|  309212        | 11th & S St NW                | 18th & R St NW                 |
|  776195        | 34th St & Wisconsin Ave NW    | 17th & Corcoran St NW          |
|  151604        | Wilson Blvd & N Uhle St       | N Veitch  & 20th St N    

It gets even better.  Try `csvgrep`:

In [90]:
!csvcut -c1,5,7 q4.csv | csvgrep -c3 -m '21st & I St NW' | head -10 | csvlook

|----------------+----------------------------------+-----------------|
|  Duration (ms) | Start station                    | End station     |
|----------------+----------------------------------+-----------------|
|  872420        | 5th St & Massachusetts Ave NW    | 21st & I St NW  |
|  1249207       | 11th & Kenyon St NW              | 21st & I St NW  |
|  548066        | 17th & Corcoran St NW            | 21st & I St NW  |
|  604362        | 18th St & Wyoming Ave NW         | 21st & I St NW  |
|  1556978       | Army Navy Dr & S Nash St         | 21st & I St NW  |
|  586738        | New Hampshire Ave & T St NW      | 21st & I St NW  |
|  259421        | 17th & K St NW / Farragut Square | 21st & I St NW  |
|  269179        | M St & Pennsylvania Ave NW       | 21st & I St NW  |
|  227962        | 25th St & Pennsylvania Ave NW    | 21st & I St NW  |
|----------------+----------------------------------+-----------------|


But wait, there's more:

In [91]:
!csvcut -c1,5,7 q4.csv | csvgrep -c3 -m '21st & I St NW' | csvsort -c2 | head -10 | csvlook

|----------------+-----------------------+-----------------|
|  Duration (ms) | Start station         | End station     |
|----------------+-----------------------+-----------------|
|  890758        | 10th & E St NW        | 21st & I St NW  |
|  1074547       | 10th & E St NW        | 21st & I St NW  |
|  751900        | 10th & E St NW        | 21st & I St NW  |
|  938747        | 10th & Florida Ave NW | 21st & I St NW  |
|  1001635       | 10th & Florida Ave NW | 21st & I St NW  |
|  837341        | 10th & Florida Ave NW | 21st & I St NW  |
|  891224        | 10th & Florida Ave NW | 21st & I St NW  |
|  1070924       | 10th & Florida Ave NW | 21st & I St NW  |
|  799070        | 10th & Florida Ave NW | 21st & I St NW  |
|----------------+-----------------------+-----------------|


And you can perform basic statistics very easily:

In [92]:
!csvcut -c1,5,7 q4.csv | csvgrep -c3 -m '21st & I St NW' | csvcut -c1 | csvstat

  1. Duration (ms)
	<type 'int'>
	Nulls: False
	Min: 65972
	Max: 29254066
	Sum: 3721091088
	Mean: 867387.2
	Median: 678586.5
	Standard Deviation: 1148902.39491
	Unique values: 4281
	5 most frequent values:
		948623:	2
		889715:	2
		417735:	2
		641735:	2
		716896:	2

Row count: 4290


### Your turn

Which set of trips had the longer average trip duration:  trips *starting* at "Massachusetts Ave & Dupont Circle NW", or trips *ending* at "Massachusetts Ave & Dupont Circle NW"?

Use as many new cells as you need to compute the answer, and then write in your answer below in the "YOUR ANSWER HERE" cell.

In [93]:
!csvcut -c1,5,7 q4.csv | csvgrep -c2 -m 'Massachusetts Ave & Dupont Circle NW' | csvcut -c1 | csvstat
raise NotImplementedError()

  1. Duration (ms)
	<type 'int'>
	Nulls: False
	Min: 61260
	Max: 74022396
	Sum: 12050906879
	Mean: 946431.07508
	Median: 607205
	Standard Deviation: 1910753.05934
	Unique values: 12672
	5 most frequent values:
		257138:	3
		231962:	2
		305764:	2
		497941:	2
		564024:	2

Row count: 12733


NotImplementedError: 

In [94]:
!csvcut -c1,5,7 q4.csv | csvgrep -c3 -m 'Massachusetts Ave & Dupont Circle NW' | csvcut -c1 | csvstat

  1. Duration (ms)
	<type 'int'>
	Nulls: False
	Min: 60476
	Max: 74676819
	Sum: 12098780287
	Mean: 840543.301862
	Median: 569422.0
	Standard Deviation: 1767359.30721
	Unique values: 14309
	5 most frequent values:
		846668:	2
		581352:	2
		368928:	2
		258948:	2
		453439:	2

Row count: 14394


In [96]:
print("An average duration of a trip from Massachusetts Ave & Dupont Circle NW is longer than an average duration of a trip to the same place, with their values being 946431.07508 ms and 840543.301862 ms respectively.") 


An average duration of a trip from Massachusetts Ave & Dupont Circle NW is longer than an average duration of a trip to the same place, with their values being 946431.07508 ms and 840543.301862 ms respectively.
