Skip to content

Commit

Permalink
Worked through the series chapter
Browse files Browse the repository at this point in the history
  • Loading branch information
palewire committed Sep 16, 2022
1 parent 6c277f1 commit 5a13d52
Showing 1 changed file with 63 additions and 38 deletions.
101 changes: 63 additions & 38 deletions jupyterlite/fast-python-notebook/notebook.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -47,10 +47,10 @@
"id": "d6129fc8-71b6-40ed-a8c5-e18f50a5d981",
"metadata": {},
"source": [
"![jupyter](https://jupyter.org/assets/homepage/labpreview.webp)\n",
"\n",
"<div style=\"max-width: 640px\">\n",
"\n",
"![jupyter](https://palewi.re/docs/first-python-notebook/_static/img/labpreview.webp)\n",
"\n",
"A [Jupyter](https://jupyter.org/) notebook is a browser-based interface where you can write, run, remix and republish code. It is free software that anyone can install and run.\n",
"\n",
"[Scientists](https://nbviewer.jupyter.org/github/robertodealmeida/notebooks/blob/master/earth_day_data_challenge/Analyzing%20whale%20tracks.ipynb), [scholars](https://nbviewer.jupyter.org/github/nealcaren/workshop_2014/blob/master/notebooks/5_Times_API.ipynb), [investors](https://github.com/rsvp/fecon235/blob/master/nb/fred-debt-pop.ipynb) and [corporations](https://netflixtechblog.com/notebook-innovation-591ee3221233) use Jupyter to create and share their research. It is also used by journalists to develop stories and show their work. Examples include:\n",
Expand Down Expand Up @@ -306,26 +306,24 @@
"## What is Pandas?"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0fb66d2d-cd31-4e26-9531-63c05c8b3dd3",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "6a769ba4-a9a9-4c26-bbfd-a14ea202b81a",
"metadata": {},
"source": [
"Lucky for us, Python is filled with functions to do pretty much anything you’d ever want to do with a programming language: [navigate the web](http://docs.python-requests.org/), [parse data](https://docs.python.org/2/library/csv.html), [interact with a database](http://www.sqlalchemy.org/), [run fancy statistics](https://www.scipy.org/), [build a pretty website](https://www.djangoproject.com/) and [so](https://www.crummy.com/software/BeautifulSoup/) [much](http://www.nltk.org/) [more](https://pillow.readthedocs.io/en/stable/).\n",
"<div style=\"max-width: 640px\">\n",
"\n",
"![pandas on the Python Package Index](https://palewi.re/docs/first-python-notebook/_static/img/pandas-pypi.png)\n",
"\n",
"Lucky for us, Python is filled with functions to do almost anything you’d want to do with a programming language: [navigate the web](http://docs.python-requests.org/), [parse data](https://docs.python.org/2/library/csv.html), [interact with a database](http://www.sqlalchemy.org/), [run fancy statistics](https://www.scipy.org/), [build a pretty website](https://www.djangoproject.com/) and [so](https://www.crummy.com/software/BeautifulSoup/) [much](http://www.nltk.org/) [more](https://pillow.readthedocs.io/en/stable/).\n",
"\n",
"Creative people have put these tools to work to get a [wide range of things](https://www.python.org/about/success/) done in the academy, the laboratory and even in outer space.\n",
"\n",
"Some of those tools are included in a toolbox that comes with the language, known as the standard library. Others have been built by members of Python’s developer community and need to be downloaded and installed from the web. One third-party tool that’s important for this class is called [pandas](https://pandas.pydata.org/). It is a tool invented at a [financial investment firm](https://www.aqr.com/) that has become a leading open-source library for accessing and analyzing data in many different fields.\n",
"Some of those tools are included in a toolbox that comes with the language, known as the standard library. Others have been built by members of Python’s developer community and need to be separately downloaded and installed. One third-party tool that’s important for this class is called [pandas](https://pandas.pydata.org/). Invented by programmers at a [financial investment firm](https://www.aqr.com/), it has become a leading open-source library for accessing and analyzing data.\n",
"\n",
"Here’s how to use pandas yourself. Run the following: "
"Here’s how to use pandas yourself. Run the following:\n",
" \n",
"</div>"
]
},
{
Expand All @@ -343,16 +341,20 @@
"id": "f05db38a-f532-4ba6-bd9f-0f7700a2eeb6",
"metadata": {},
"source": [
"If nothing happens, that’s good. It means you have pandas installed and ready as to use.\n",
"<div style=\"max-width: 640px\">\n",
"\n",
"If nothing happens, that’s good. It means you have it installed and ready as to use.\n",
"\n",
"> Note: Since pandas is created by a third party independent from the core Python developers, it may not be available by default if you manually installed Python and Jupyter. It’s available here because our students are using JupyterLab Desktop, whose developers have curated a list of common utilities to include with their installation. Consult our [advanced installation guide](https://palewi.re/docs/first-python-notebook/appendix/index.html) if the cell above threw an error.\n",
"> Note: Since pandas is created by a third party independent from the core Python developers, it may not be available by default if you manually installed Python and Jupyter. It’s available here because JupyterLite, whose developers have curated a list of common utilities to include with their distribution. Consult our [advanced installation guide](https://palewi.re/docs/first-python-notebook/appendix/index.html) if the cell above threw an error.\n",
"\n",
"Now let's run the same code again, but with a small addition."
"Now let's run the same code again, but with a small addition.\n",
" \n",
"</div>"
]
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 10,
"id": "bd9c5ff8-a7f4-409d-a7e2-550787b1dd33",
"metadata": {},
"outputs": [],
Expand All @@ -365,16 +367,24 @@
"id": "81953ecf-f91a-4289-a7ea-b03289767de4",
"metadata": {},
"source": [
"This will alias the pandas library at the shorter variable name of `pd`. This is standard practice in the pandas community. You will frequently see examples of pandas code online using pd as shorthand. It’s not required, but it’s good to get in the habit so that your code will be understood by other computer programmers.\n",
"<div style=\"max-width: 640px\">\n",
"\n",
"This will alias the pandas library at the shorter variable name of `pd`. This is standard practice in the pandas community. You will frequently see examples of pandas code online using pd as shorthand. It’s not required, but it’s good to get in the habit so that your code will be better understood by other computer programmers.\n",
"\n",
"Those two little letters contain dozens of data analysis tools that we’ll use in future lessons. They can import massive data files, compute advanced statistics, filter, sort, rank and do just about anything else you’d want to do.\n",
"\n",
"We’ll get to all of that soon enough, but let’s start out with something simple. Let’s make a list of numbers in a new notebook cell. To keep things simple, enter all of the even numbers between zero and ten. Press play.\n"
"We’ll get to all of that soon enough, but let’s start out with something simple. Let's run some simple stats.\n",
"\n",
"## Calculating descriptive statistics\n",
"\n",
"Start by making a list of numbers in a new notebook cell. To keep things simple, we'll start with all of the even numbers between zero and ten. Note the variable name I've assigned. Then press play.\n",
" \n",
"</div>"
]
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 15,
"id": "76408f89-55ed-4996-ac0e-c520c91cb9f5",
"metadata": {},
"outputs": [],
Expand All @@ -387,14 +397,19 @@
"id": "85362e2b-756e-4f9e-bfa6-550311636793",
"metadata": {},
"source": [
"If you’re a skilled Python programmer, you can do some cool stuff with any list, including run statistics. But if you hand over to pandas instead, you’ll be impressed by how easily you can analyze the data without knowing much computer code at all.\n",
"<div style=\"max-width: 640px\">\n",
"\n",
"In this case, it’s as simple as converting that plain Python list into what pandas calls a Series. Here’s how to make it happen in your next cell."
"If you’re a skilled Python programmer, you can do some cool stuff with any list, including run statistics. But if you hand over to pandas instead, you’ll be impressed by how easily you can analyze the data without much computer code.\n",
"\n",
"\n",
"In this case, it’s as simple as converting that plain Python list into what pandas calls a [`Series`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html). Here’s how to make it happen:\n",
" \n",
"</div>"
]
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 16,
"id": "2855d3fe-4886-4715-ad7a-38bf0f8852dc",
"metadata": {},
"outputs": [],
Expand All @@ -407,14 +422,18 @@
"id": "6f121e7f-7abf-4f00-aae0-cc7dddadee05",
"metadata": {},
"source": [
"Once the data becomes a Series, you can immediately run a wide range of [descriptive statistics](https://en.wikipedia.org/wiki/Descriptive_statistics). Let’s try a few.\n",
"<div style=\"max-width: 640px\">\n",
"\n",
"Once the data becomes a `Series`, you can immediately run a wide range of [descriptive statistics](https://en.wikipedia.org/wiki/Descriptive_statistics). Let’s try a few.\n",
"\n",
"First, let’s sum all the numbers."
"First, let’s sum all the numbers.\n",
" \n",
"</div>"
]
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 17,
"id": "756d433a-5379-4df9-acea-252067ff7fff",
"metadata": {},
"outputs": [
Expand All @@ -424,7 +443,7 @@
"20"
]
},
"execution_count": 12,
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -443,7 +462,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 18,
"id": "8f62ab11-b0bc-4565-9c3b-ce48d9229414",
"metadata": {},
"outputs": [
Expand All @@ -453,7 +472,7 @@
"8"
]
},
"execution_count": 13,
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -474,7 +493,7 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": 19,
"id": "9b3036dd-b239-4f08-903f-989989c257f8",
"metadata": {},
"outputs": [
Expand All @@ -484,7 +503,7 @@
"2"
]
},
"execution_count": 14,
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -503,7 +522,7 @@
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": 20,
"id": "3c4cfda5-eb9c-4279-ade2-12df439c3f81",
"metadata": {},
"outputs": [
Expand All @@ -513,7 +532,7 @@
"5.0"
]
},
"execution_count": 15,
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -532,7 +551,7 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": 21,
"id": "5e4cd26c-a2e3-495f-818d-339b10a91040",
"metadata": {},
"outputs": [
Expand All @@ -542,7 +561,7 @@
"5.0"
]
},
"execution_count": 16,
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -561,7 +580,7 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": 22,
"id": "b94eb9e9-a92d-4651-8861-9a40545f2de1",
"metadata": {},
"outputs": [
Expand All @@ -571,7 +590,7 @@
"2.581988897471611"
]
},
"execution_count": 17,
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
Expand Down Expand Up @@ -622,7 +641,13 @@
"id": "feb421c5-3519-42f3-a4f1-94d1d285f8c8",
"metadata": {},
"source": [
"Substitute in a series of 10 million records — or even just the odd numbers between zero and ten — and your notebook would calculate all those same statistics without you needing to write any more code. Once your data, however large or complex, is imported into pandas, there’s little limit to what you can do to filter, merge, group, aggregate, compute or chart using simple methods like the ones above."
"<div style=\"max-width: 640px\">\n",
"\n",
"Before you move on, go back the `my_list` variable and change the list. Maybe add a few more values. Or switch to odds. Then rerun all the cells above. You'll see all the statistics update to reflect the different dataset.\n",
"\n",
"Substitute in a series of 10 million records and your notebook would calculate all the same statistics without you needing to write any more code. Once your data, however large or complex, is imported into pandas, simple statistics become a snap.\n",
"\n",
"</div>"
]
},
{
Expand Down

0 comments on commit 5a13d52

Please sign in to comment.