forked from swcarpentry/python-novice-inflammation
-
Notifications
You must be signed in to change notification settings - Fork 0
/
04-files.html
133 lines (122 loc) · 8.76 KB
/
04-files.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="generator" content="pandoc">
<title>Software Carpentry: Programming with Python</title>
<link rel="shortcut icon" type="image/x-icon" href="/favicon.ico" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap.css" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap-theme.css" />
<link rel="stylesheet" type="text/css" href="css/swc.css" />
<link rel="alternate" type="application/rss+xml" title="Software Carpentry Blog" href="http://software-carpentry.org/feed.xml"/>
<meta charset="UTF-8" />
<!-- HTML5 shim, for IE6-8 support of HTML5 elements -->
<!--[if lt IE 9]>
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
</head>
<body class="lesson">
<div class="container card">
<div class="banner">
<a href="http://software-carpentry.org" title="Software Carpentry">
<img alt="Software Carpentry banner" src="img/software-carpentry-banner.png" />
</a>
</div>
<article>
<div class="row">
<div class="col-md-10 col-md-offset-1">
<a href="index.html"><h1 class="title">Programming with Python</h1></a>
<h2 class="subtitle">Analyzing Data from Multiple Files</h2>
<section class="objectives panel panel-warning">
<div class="panel-heading">
<h2 id="learning-objectives"><span class="glyphicon glyphicon-certificate"></span>Learning Objectives</h2>
</div>
<div class="panel-body">
<ul>
<li>Use a library function to get a list of filenames that match a simple wildcard pattern.</li>
<li>Use a for loop to process multiple files.</li>
</ul>
</div>
</section>
<p>We now have almost everything we need to process all our data files. The only thing that’s missing is a library with a rather unpleasant name:</p>
<div class="sourceCode"><pre class="sourceCode python"><code class="sourceCode python"><span class="im">import</span> glob</code></pre></div>
<p>The <code>glob</code> library contains a function, also called <code>glob</code>, that finds files and directories whose names match a pattern. We provide those patterns as strings: the character <code>*</code> matches zero or more characters, while <code>?</code> matches any one character. We can use this to get the names of all the CSV files in the current directory:</p>
<div class="sourceCode"><pre class="sourceCode python"><code class="sourceCode python"><span class="bu">print</span>(glob.glob(<span class="st">'data/inflammation*.csv'</span>))</code></pre></div>
<pre class="output"><code>['data/inflammation-05.csv', 'data/inflammation-11.csv', 'data/inflammation-12.csv', 'data/inflammation-08.csv', 'data/inflammation-03.csv', 'data/inflammation-06.csv', 'data/inflammation-09.csv', 'data/inflammation-07.csv', 'data/inflammation-10.csv', 'data/inflammation-02.csv', 'data/inflammation-04.csv', 'data/inflammation-01.csv']</code></pre>
<p>As these examples show, <code>glob.glob</code>’s result is a list of file and directory paths in arbitrary order. This means we can loop over it to do something with each filename in turn. In our case, the “something” we want to do is generate a set of plots for each file in our inflammation dataset. If we want to start by analyzing just the first three files in alphabetical order, we can use the <code>sorted</code> built-in function to generate a new sorted list from the the <code>glob.glob</code> output:</p>
<div class="sourceCode"><pre class="sourceCode python"><code class="sourceCode python"><span class="im">import</span> numpy
<span class="im">import</span> matplotlib.pyplot
filenames <span class="op">=</span> <span class="bu">sorted</span>(glob.glob(<span class="st">'data/inflammation*.csv'</span>))
filenames <span class="op">=</span> filenames[<span class="dv">0</span>:<span class="dv">3</span>]
<span class="cf">for</span> f <span class="op">in</span> filenames:
<span class="bu">print</span>(f)
data <span class="op">=</span> numpy.loadtxt(fname<span class="op">=</span>f, delimiter<span class="op">=</span><span class="st">','</span>)
fig <span class="op">=</span> matplotlib.pyplot.figure(figsize<span class="op">=</span>(<span class="fl">10.0</span>, <span class="fl">3.0</span>))
axes1 <span class="op">=</span> fig.add_subplot(<span class="dv">1</span>, <span class="dv">3</span>, <span class="dv">1</span>)
axes2 <span class="op">=</span> fig.add_subplot(<span class="dv">1</span>, <span class="dv">3</span>, <span class="dv">2</span>)
axes3 <span class="op">=</span> fig.add_subplot(<span class="dv">1</span>, <span class="dv">3</span>, <span class="dv">3</span>)
axes1.set_ylabel(<span class="st">'average'</span>)
axes1.plot(numpy.mean(data, axis<span class="op">=</span><span class="dv">0</span>))
axes2.set_ylabel(<span class="st">'max'</span>)
axes2.plot(numpy.<span class="bu">max</span>(data, axis<span class="op">=</span><span class="dv">0</span>))
axes3.set_ylabel(<span class="st">'min'</span>)
axes3.plot(numpy.<span class="bu">min</span>(data, axis<span class="op">=</span><span class="dv">0</span>))
fig.tight_layout()
matplotlib.pyplot.show()</code></pre></div>
<pre class="output"><code>inflammation-01.csv</code></pre>
<p><img src="fig/03-loop_49_1.png" alt="Analysis of inflammation-01.csv" /><br />
</p>
<pre class="output"><code>inflammation-02.csv</code></pre>
<p><img src="fig/03-loop_49_3.png" alt="Analysis of inflammation-02.csv" /><br />
</p>
<pre class="output"><code>inflammation-03.csv</code></pre>
<p><img src="fig/03-loop_49_5.png" alt="Analysis of inflammation-03.csv" /><br />
Sure enough, the maxima of the first two data sets show exactly the same ramp as the first, and their minima show the same staircase structure; a different situation has been revealed in the third dataset, where the maxima are a bit less regular, but the minima are consistently zero.</p>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2 id="plotting-differences"><span class="glyphicon glyphicon-pencil"></span>Plotting Differences</h2>
</div>
<div class="panel-body">
<p>Plot the difference between the average of the first dataset and the average of the second dataset, i.e., the difference between the leftmost plot of the first two figures.</p>
</div>
</section>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2 id="generate-composite-statistics"><span class="glyphicon glyphicon-pencil"></span>Generate Composite statistics</h2>
</div>
<div class="panel-body">
<p>Use each of the files once to generate a dataset containing values averaged over all patients:</p>
<div class="sourceCode"><pre class="sourceCode python"><code class="sourceCode python">filenames <span class="op">=</span> glob.glob(<span class="st">'data/inflammation*.csv'</span>)
composite_data <span class="op">=</span> numpy.zeros((<span class="dv">60</span>,<span class="dv">40</span>))
<span class="cf">for</span> f <span class="op">in</span> filenames:
<span class="co"># sum each new file's data into as it's read</span>
<span class="co"># and then divide the composite_data by number of samples</span>
composite_data <span class="op">/=</span> <span class="bu">len</span>(filenames)</code></pre></div>
<p>Then use pyplot to generate average, max, and min for all patients.</p>
</div>
</section>
</div>
</div>
</article>
<div class="footer">
<a class="label swc-blue-bg" href="http://software-carpentry.org">Software Carpentry</a>
<a class="label swc-blue-bg" href="https://github.com/swcarpentry/python-novice-inflammation">Source</a>
<a class="label swc-blue-bg" href="mailto:admin@software-carpentry.org">Contact</a>
<a class="label swc-blue-bg" href="LICENSE.html">License</a>
</div>
</div>
<!-- Javascript placed at the end of the document so the pages load faster -->
<script src="http://software-carpentry.org/v5/js/jquery-1.9.1.min.js"></script>
<script src="css/bootstrap/bootstrap-js/bootstrap.js"></script>
<script src='https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'></script>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-37305346-2', 'auto');
ga('send', 'pageview');
</script>
</body>
</html>