Skip to content

reprocourse/template-a2-data-in-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Loading and accessing data in Python

We're interested in enabling reproducible statistical analyses; we're therefore going to focus on both accessing our data as well as executing statistical analyses programmatically. In this assignment, we'll work through accessing data in a standardized fashion using the Python programming language.

Why Python?

Although there are a wide variety of programming languages available for data analysis, we've chosen to focus on Python. We've made this choice since Python has several important characteristics:

  • It's free and open source software (FOSS), meaning that it's accessible without a subscription plan
  • It's a great general purpose scripting language, used in a wide variety of industries today
  • It has a strong ecosystem for scientific analyses

For statistics, the other very popular language to know is R—I highly recommend you check it out, even though we won't be using it in this course.

How to get started

To get a working python environment, please install Miniconda into the computational environment you set up in A1. Specifically, please make sure to install the Python 3.7 build as there are significant differences between the two Python versions; if you're interested, you can learn more about some of these differences here.

Once you have installed Miniconda, you will have access to a working Python environment. First, we'll be using pip —a python package management system— to install several packages we'll need to execute the assignment. From an open terminal, you can execute the following commands

pip ipython
pip numpy
pip pandas

Notice that these commands are executed outside of the Python environment! Now we can start Python using the following command:

ipython

Accessing data

For this assignment, we'll be using the classic iris dataset. We'll ask you to load the dataset and provide some descriptive statistics. I've started the code for you in the associated descriptive_statistics.py file.

Calculating descriptive statistics

In your copy of the repository, please extend the descriptive_statistics.py code to include calculating descriptive statistics in either numpy or pandas.

How to submit

When you are ready to submit, please open a pull request on your copy of the repostiory with the modified descriptive_statistics.py code and the table below completed:

Iris type Mean Sepal Length Standard Deviation of Sepal Length
Setosa XX XX
Versicolor XX XX
Virginica XX XX

About

An example of loading and accessing data in python

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages