Loading and accessing data in Python

We're interested in enabling reproducible statistical analyses; we're therefore going to focus on both accessing our data as well as executing statistical analyses programmatically. In this assignment, we'll work through accessing data in a standardized fashion using the Python programming language.

Why Python?

Although there are a wide variety of programming languages available for data analysis, we've chosen to focus on Python. We've made this choice since Python has several important characteristics:

It's free and open source software (FOSS), meaning that it's accessible without a subscription plan
It's a great general purpose scripting language, used in a wide variety of industries today
It has a strong ecosystem for scientific analyses

For statistics, the other very popular language to know is R—I highly recommend you check it out, even though we won't be using it in this course.

How to get started

To get a working python environment, please install Miniconda into the computational environment you set up in A1. Specifically, please make sure to install the Python 3.7 build as there are significant differences between the two Python versions; if you're interested, you can learn more about some of these differences here.

Once you have installed Miniconda, you will have access to a working Python environment. First, we'll be using pip —a python package management system— to install several packages we'll need to execute the assignment. From an open terminal, you can execute the following commands

pip ipython
pip numpy
pip pandas

Notice that these commands are executed outside of the Python environment! Now we can start Python using the following command:

ipython

Accessing data

For this assignment, we'll be using the classic iris dataset. We'll ask you to load the dataset and provide some descriptive statistics. I've started the code for you in the associated descriptive_statistics.py file.

Calculating descriptive statistics

In your copy of the repository, please extend the descriptive_statistics.py code to include calculating descriptive statistics in either numpy or pandas.

How to submit

When you are ready to submit, please open a pull request on your copy of the repostiory with the modified descriptive_statistics.py code and the table below completed:

Iris type	Mean Sepal Length	Standard Deviation of Sepal Length
Setosa	XX	XX
Versicolor	XX	XX
Virginica	XX	XX

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
descriptive_statistics.py		descriptive_statistics.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Loading and accessing data in Python

Why Python?

How to get started

Accessing data

Calculating descriptive statistics

How to submit

About

Uh oh!

Releases

Packages

Languages

License

reprocourse/template-a2-data-in-python

Folders and files

Latest commit

History

Repository files navigation

Loading and accessing data in Python

Why Python?

How to get started

Accessing data

Calculating descriptive statistics

How to submit

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages