Skip to content

Commit

Permalink
Docs on how to read in the CSV
Browse files Browse the repository at this point in the history
  • Loading branch information
palewire committed Feb 12, 2017
1 parent 2391f97 commit 32e0d41
Show file tree
Hide file tree
Showing 16 changed files with 1,407 additions and 22 deletions.
1,279 changes: 1,279 additions & 0 deletions Untitled.ipynb

Large diffs are not rendered by default.

Binary file modified docs/_build_html/.doctrees/environment.pickle
Binary file not shown.
Binary file modified docs/_build_html/.doctrees/index.doctree
Binary file not shown.
Binary file added docs/_build_html/_images/head.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_build_html/_images/info.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_build_html/_images/read_csv.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
55 changes: 48 additions & 7 deletions docs/_build_html/_sources/index.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -234,19 +234,60 @@ Use the next open box to import pandas into our script, so we can use all its fa
import pandas
Run the notebook cell. If nothing happens, that's good. It means you have pandas installed and ready to work. If you get an error message, return to the prequisites section above and make sure you have everything installed properly. If you do and it still doesn't work, copy and paste the tail end of your error message into Google. Among the results there will almost certainly be others working through the same problem.
Run the notebook cell. If nothing happens, that's good. It means you have pandas installed and ready to work.

If you get an error message, return to the prequisites section above and make sure you have everything installed properly. If you do and it still doesn't work, copy and paste the tail end of your error message into Google. Among the results there will almost certainly be others working through the same problem.


Act 3: Hello analysis
---------------------

Until last November, the use and sale of marijuana for recreational purposes was illegal in California. That changed when voters approved
Proposition 64, which appeared, which asked voters if it ought to be legalized. A "yes" vote supported legalization. A "no" vote opposed it. In the final tally, 57% voted yes.
Until last November, the use and sale of marijuana for recreational purposes was illegal in California. That changed when voters approved Proposition 64, which asked voters if the practice ought to be legalized. A "yes" vote supported legalization. A "no" vote opposed it. `In the final tally <http://elections.cdn.sos.ca.gov/sov/2016-general/sov/65-ballot-measures-formatted.pdf>`_, 57% of voters said yes.

`According to California's Secretary of State <http://www.sos.ca.gov/campaign-lobbying/cal-access-resources/measure-contributions/marijuana-legalization-initiative-statute/>`_, approximately $23 million was raised to campaign in support of Prop. 64. Almost 2 million was raised to oppose it.

Your mission, should you choose to accept it, is to analyze lists of campaign committees and contributors to figure out the biggest donors both for and against the measure.

To start `click here <https://raw.githubusercontent.com/california-civic-data-coalition/first-python-notebook/master/docs/_static/prop-committees.csv>`_ to download a list of last November's 17 ballot measures and their affiliated fundraising committees.

The data are structured in rows of comma-separated values. This is known as a CSV file. It is the most common way you will find data published online. Save the file with the name ``prop-committees.csv`` in the same directory where you made your notebook.

Open the file in your notebook using the `read_csv <http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html>`_ function in ``pandas``.

.. code-block:: python
pandas.read_csv("./docs/_static/prop-committees.csv")
After you run the cell, you should see something like this.

.. image:: /_static/read_csv.png


It is a ``DataFrame`` where ``pandas`` has structured the CSV data into rows in columns, just like Excel or another spreadsheet software might. The advantage offered here is that rather than manipulating the data through a haphazard series of clicks and keypunches, we will be gradually grinding down the data using a computer programming script that is 100% transparent and reproducible.

According to California's Secretary of State, approximately $23 million was raised to campaign in support of Prop. 64. Almost 2 million was been raised to oppose it.
In order to do that, we need to store our ``DataFrame`` so it can be reused in subsequent cells. We can do this by saving in a `"variable" <https://en.wikipedia.org/wiki/Variable_(computer_science)>`_, which is a fancy computer programming word for a named shortcut where we save our work as we go.

Your mission, should you choose to accept it, is to download a list of campaign committees and contributors to figure out the biggest donors both for and against the measure.
Go back to your initial cell and change it to this. Then rerun it.

.. code-block:: python
props = pandas.read_csv("./docs/_static/prop-committees.csv")
After you run it, you shouldn't see anything. That's a good thing. It means our ``DataFrame`` has been saved under the name ``props``, which we can now begin interacting with in the cells that follow. We can do this by calling `"methods" <https://en.wikipedia.org/wiki/Method_(computer_programming)>`_ that ``pandas`` has made available to all ``DataFrames``. There are dozens of these that can do all sorts of interesting things. Let's start with some easy ones that analysts
use all the time.

First, to preview the first few rows of the dataset, try the `head <http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.head.html>`_ method.

.. code-block:: python
props.head()
.. image:: /_static/head.png

To get a look at all of the columns and what type of data they store, try `info <http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.info.html>`_.

.. code-block:: python
Click here to download the file as a list of comma-separated values. This is known as a CSV file. It is the most common way you will find data published online. Save the file with the name first-python-notebook.csv in the same directory where you made this notebook.
props.info()
Download a list of committees that supported and opposed one or more of the 17 measures on last November's ballot.
.. image:: /_static/info.png
Binary file added docs/_build_html/_static/head.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_build_html/_static/info.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_build_html/_static/read_csv.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
38 changes: 31 additions & 7 deletions docs/_build_html/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -291,16 +291,40 @@ <h2>Act 2: Hello pandas<a class="headerlink" href="#act-2-hello-pandas" title="P
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">pandas</span>
</pre></div>
</div>
<p>Run the notebook cell. If nothing happens, that&#8217;s good. It means you have pandas installed and ready to work. If you get an error message, return to the prequisites section above and make sure you have everything installed properly. If you do and it still doesn&#8217;t work, copy and paste the tail end of your error message into Google. Among the results there will almost certainly be others working through the same problem.</p>
<p>Run the notebook cell. If nothing happens, that&#8217;s good. It means you have pandas installed and ready to work.</p>
<p>If you get an error message, return to the prequisites section above and make sure you have everything installed properly. If you do and it still doesn&#8217;t work, copy and paste the tail end of your error message into Google. Among the results there will almost certainly be others working through the same problem.</p>
</div>
<div class="section" id="act-3-hello-analysis">
<h2>Act 3: Hello analysis<a class="headerlink" href="#act-3-hello-analysis" title="Permalink to this headline"></a></h2>
<p>Until last November, the use and sale of marijuana for recreational purposes was illegal in California. That changed when voters approved
Proposition 64, which appeared, which asked voters if it ought to be legalized. A &#8220;yes&#8221; vote supported legalization. A &#8220;no&#8221; vote opposed it. In the final tally, 57% voted yes.</p>
<p>According to California&#8217;s Secretary of State, approximately $23 million was raised to campaign in support of Prop. 64. Almost 2 million was been raised to oppose it.</p>
<p>Your mission, should you choose to accept it, is to download a list of campaign committees and contributors to figure out the biggest donors both for and against the measure.</p>
<p>Click here to download the file as a list of comma-separated values. This is known as a CSV file. It is the most common way you will find data published online. Save the file with the name first-python-notebook.csv in the same directory where you made this notebook.</p>
<p>Download a list of committees that supported and opposed one or more of the 17 measures on last November&#8217;s ballot.</p>
<p>Until last November, the use and sale of marijuana for recreational purposes was illegal in California. That changed when voters approved Proposition 64, which asked voters if the practice ought to be legalized. A &#8220;yes&#8221; vote supported legalization. A &#8220;no&#8221; vote opposed it. <a class="reference external" href="http://elections.cdn.sos.ca.gov/sov/2016-general/sov/65-ballot-measures-formatted.pdf">In the final tally</a>, 57% of voters said yes.</p>
<p><a class="reference external" href="http://www.sos.ca.gov/campaign-lobbying/cal-access-resources/measure-contributions/marijuana-legalization-initiative-statute/">According to California&#8217;s Secretary of State</a>, approximately $23 million was raised to campaign in support of Prop. 64. Almost 2 million was raised to oppose it.</p>
<p>Your mission, should you choose to accept it, is to analyze lists of campaign committees and contributors to figure out the biggest donors both for and against the measure.</p>
<p>To start <a class="reference external" href="https://raw.githubusercontent.com/california-civic-data-coalition/first-python-notebook/master/docs/_static/prop-committees.csv">click here</a> to download a list of last November&#8217;s 17 ballot measures and their affiliated fundraising committees.</p>
<p>The data are structured in rows of comma-separated values. This is known as a CSV file. It is the most common way you will find data published online. Save the file with the name <code class="docutils literal"><span class="pre">prop-committees.csv</span></code> in the same directory where you made your notebook.</p>
<p>Open the file in your notebook using the <a class="reference external" href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html">read_csv</a> function in <code class="docutils literal"><span class="pre">pandas</span></code>.</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">pandas</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s2">&quot;./docs/_static/prop-committees.csv&quot;</span><span class="p">)</span>
</pre></div>
</div>
<p>After you run the cell, you should see something like this.</p>
<img alt="_images/read_csv.png" src="_images/read_csv.png" />
<p>It is a <code class="docutils literal"><span class="pre">DataFrame</span></code> where <code class="docutils literal"><span class="pre">pandas</span></code> has structured the CSV data into rows in columns, just like Excel or another spreadsheet software might. The advantage offered here is that rather than manipulating the data through a haphazard series of clicks and keypunches, we will be gradually grinding down the data using a computer programming script that is 100% transparent and reproducible.</p>
<p>In order to do that, we need to store our <code class="docutils literal"><span class="pre">DataFrame</span></code> so it can be reused in subsequent cells. We can do this by saving in a <a class="reference external" href="https://en.wikipedia.org/wiki/Variable_(computer_science)">&#8220;variable&#8221;</a>, which is a fancy computer programming word for a named shortcut where we save our work as we go.</p>
<p>Go back to your initial cell and change it to this. Then rerun it.</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">props</span> <span class="o">=</span> <span class="n">pandas</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s2">&quot;./docs/_static/prop-committees.csv&quot;</span><span class="p">)</span>
</pre></div>
</div>
<p>After you run it, you shouldn&#8217;t see anything. That&#8217;s a good thing. It means our <code class="docutils literal"><span class="pre">DataFrame</span></code> has been saved under the name <code class="docutils literal"><span class="pre">props</span></code>, which we can now begin interacting with in the cells that follow. We can do this by calling <a class="reference external" href="https://en.wikipedia.org/wiki/Method_(computer_programming)">&#8220;methods&#8221;</a> that <code class="docutils literal"><span class="pre">pandas</span></code> has made available to all <code class="docutils literal"><span class="pre">DataFrames</span></code>. There are dozens of these that can do all sorts of interesting things. Let&#8217;s start with some easy ones that analysts
use all the time.</p>
<p>First, to preview the first few rows of the dataset, try the <a class="reference external" href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.head.html">head</a> method.</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">props</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
</pre></div>
</div>
<img alt="_images/head.png" src="_images/head.png" />
<p>To get a look at all of the columns and what type of data they store, try <a class="reference external" href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.info.html">info</a>.</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">props</span><span class="o">.</span><span class="n">info</span><span class="p">()</span>
</pre></div>
</div>
<img alt="_images/info.png" src="_images/info.png" />
</div>
</div>

Expand Down

0 comments on commit 32e0d41

Please sign in to comment.