-
Notifications
You must be signed in to change notification settings - Fork 14
/
project.html
263 lines (207 loc) · 21.5 KB
/
project.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Final Project | IDS 705</title>
<!-- bootstrap -->
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/css/bootstrap.min.css">
<!-- Google fonts -->
<link href='https://fonts.googleapis.com/css?family=Roboto:700,400,300' rel='stylesheet' type='text/css'>
<!-- Google Analytics -->
<script>
</script>
<link rel="stylesheet" type="text/css" href="style.css" />
</head>
<body>
<div id="header">
<a href="index.html">
<h1>IDS 705: Principles of Machine Learning</h1>
</a>
<div class='text-center'>
<h4>Duke University</h4>
<h4>Spring 2023</h4>
</div>
<div style="clear:both;"></div>
</div>
<div class="container sec">
<h1>Final Project</h1>
<h2>Summary and Goals</h2>
<p>Machine learning tools are not an end in themselves, but yield value when making predictions, quantifying and describing phenomena in the world around us, and in all these ways and more helping us to make decisions that would otherwise be difficult or impossible. For this final project, you will work in teams to (1) identify a problem to solve or a question to answer, (2) apply machine learning techniques to conduct experiments to address the issues identified in (1), (3) rigorously evaluate the performance of your approach, and (4) clearly communicate your findings to a wide audience. The <b>deliverables for this project</b> are:</p>
<ol>
<li><a class="toc" href="#proposal">Project proposal</a></li>
<li><a class="toc" href="#finalreport">Final written report</a> and a draft report prior to final submission</li>
<li><a class="toc" href="#video">Presentation</a> (in the form of a video). During our final class meeting we will have a project video showcase and competition.</li>
<li><a class="toc" href="#github">Github repository</a> for your project</li>
<li><a class="toc" href="#peerevaluation">Peer evaluation</a></li>
</ol>
Other topics described in this document related to the project include:
<ul>
<li><a class="toc" href="#learningobjectives">Learning objectives</a></li>
<li><a class="toc" href="#evaluation">Submission, Evaluation, & Grading</a></li>
<li><a class="toc" href="#ideas">Ideas for datasets</a></li>
<li><a class="toc" href="#faq">Frequently Asked Questions</a></li>
</ul>
</div>
<div id="learningobjectives" class="sechighlight">
<div class="container sec">
<h2>Learning Objectives</h2>
<p>This project is an opportunity to identify and deeply explore a question or problem of your choosing, using machine learning tools. A central component of your project must be a machine learning methodology. It does not have to be one that we've explicitly discussed in class as you're welcome to use the project as an opportunity to learn new topics, although you're welcome to gain greater experience using the techniques from class by putting them into practice. The objectives of this project are to...
<ol>
<li>Develop deeper competency in applying machine learning methods in practical applications</li>
<li>Gain experience in learning more about a topi beyond what was explicitly discussed, but by building on the foundation you have developed throughout the course which enables you to learn about other machine learning concepts</li>
<li>Increase your experience with collaborative data science workflows</li>
<li>Expand your data science portfolio</li>
</ol>
In this project you will use what you've learned throughout this course and build on that knowledge and experience to apply the paradigms, algorithms, evaluation tools, and interpretation techniques discussed throughout the course. I strongly encourage you to pick a project that is of genuine interest in some way (e.g. the application, the tools, the dataset, etc.). Learning comes from stretching yourself: this requires that you push yourself into some unfamiliar territory and that is often a challenge and leads to desirable difficulty. Through this struggle is how the best learning happens, but it requires perseverance and that is best achieved when you are able to bring intrinsic motivation to that challenge. Find a topic of interest and embrace the challenge!
</p>
<p>There are (at least) three types of projects for you to consider proposing:
<ol>
<li><b>Solve a problem.</b> Identify a challenge for which a machine learning technique could be part of a solution and specify what that challenge is, what are the inputs and desired outputs, and how you would measure success and why you would set the specific goal that you did for the project.</li>
<li><b>Answer a question.</b> Identify a question you're interested to answer using machine learning tools, identify the experiment you would need to run to answer it and using what metrics, and design the experiment to answer the question.</li>
<li><b>Design a tool.</b> Identify an objective that you wish to accomplish, and build and publish the tool such that you're ready to share it with the world and demo it by the time of your final presentation</li>
</ol>
</p>
<h3>Requirements</h3>
<ul>
<li>The project must involve machine learning techniques or concepts (this could include concepts we were not able to cover in the course, but realize that may increase the challenge of the project.</li>
<li>The project must be able to be completed within the course of this semester and should be scoped correctly: neither too easy given the number of members of your team, nor too difficult such that you will not have a finished product by semester's end.</li>
<li>Every project should involve reading about both your application domain and the methods that you're using, especially in technical reports (be wary of content on Medium and blogs). You'll need to provide context for your work, and demonstrate that you've learned more about your project topic through the course of your work.</li>
</ul>
</div>
</div>
<div id="proposal" class="container sec">
<h2>Proposal</h2>
<p>Your team will submit a short project proposal. You will receive written feedback that should be used to guide your project development and execution. There are no length requirements. Every proposal should have the title of the project and the list of team members at the top of the first page.</p>
<p>You can find the <a class="emphasis" href="https://docs.google.com/document/d/1DgPLjUB7VqBOdyYFWCZ6iYdqhyHp-etXAEgI3QIA__c/edit?usp=sharing">project proposal template and instructions here</a>. We recommend you think about and discuss the different points mentioned in the template prior to submission.</p>
<p>Additionally, content from your proposal may be reused in your draft/final report and so you're encouraged to invest in it with that in mind.</p>
<p>If you are looking for ideas about datasets, etc., please see the <a href="#ideas">Ideas section</a> below. Please stop by office hours if you would like to discuss specific project ideas or for any other help in selecting your project idea.</p>
</div>
<div id="finalreport" class="container sec">
<h2>Final Report</h2>
<p>The final project report that you submit will consist of two parts: (1) a draft project report and (2) a final report. The draft project report is your main opportunity to get detailed feedback on your report. While the draft report won't be graded, we will provide written feedback and suggestions in the form of Google doc comments that we would strongly recommend addressing in your final report.</p>
<p>Please find the <a class="emphasis" href="https://docs.google.com/document/d/1aTpyQvFVl_F9BBoDc5Okf4mCWSlSqN5ETvHVXcbvFAA/edit?usp=sharing">instructions and template for the report here</a>.</p>
</div>
<div id="video" class="container sec">
<h2>Video</h2>
<p>You will also submit an up-to-4 minute video summarizing your project. This video should be visually compelling and should not miss the “forest for the trees” – don’t get lost in technical details. Imagine your aunt and uncle watching this video – would they know what is going on? Would they find it approachable and engaging? For inspiration for what makes a good explanatory video, watch videos from the following series:</p>
<ul>
<li><a href="https://www.youtube.com/channel/UCbfYPyITQ-7l4upoX8nvctg">Two Minute Papers</a> by Károly Zsolnai-Fehér. Concise 1-4 minute summaries of cutting edge research papers.</li>
<li><a href="https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw">3Blue1Brown</a> by Grant Sanderson. Mathematical concepts conveyed clearly, intuitively, and visually.</li>
<li><a href="https://www.youtube.com/channel/UConVfxXodg78Tzh5nNu85Ew">Welch Labs</a> by Stephen Welch. Series on machine learning, neural networks, and imaginary numbers.</li>
</ul>
<p>Once you're working on producing your video, ask your friends (especially those who may not be as technically inclined) for feedback. Do they think it was engaging/easy to follow? Ask them their takeaways: did they get the message you were trying to communicate? Address their feedback to help you ensure the quality of your video. You're encouraged to use the audio-visual medium to the fullest to clearly present your project.</p>
<p>You'll submit your video as either a live link (such as Youtube) or as an .mp4 file to the instructional team (please test your file to make sure it plays before submitting).</p>
</div>
<div id="github" class="container sec">
<h2>Github Repository</h2>
<p>Your github respository should (a) contain a descriptive README.md file that explains what the repo is for, and how to use the code to reproduce your work (including how to set it up to run), (b) be well commented throughout all files, (c) list all dependencies in a requirements.txt file, (d) inform the user how to get the data and includes all preprocessing code, and (e) actually runs (i.e. we can successfully test it) and does what it says</p>
<p>Also include a copy of your final report and a link to your project video from the README.md file.</p>
</div>
<div id="peerevaluation" class="container sec">
<h2>Peer Evaluation</h2>
<p>Since this is a team project, you will also receive feedback from your teammates AND reflect on your own performance in a self-evaluation. You will be evaluating your fellow team members on the following criteria:</p>
<ol>
<li>Was dependable in attending meetings to work on the project</li>
<li>Did work accurately and completely</li>
<li>Completed work on time</li>
<li>Contributed positively to team discussions</li>
<li>Helped others when needed</li>
<li>Responded to communications in a timely manner</li>
<li>Treated other team members respectfully</li>
<li>Demonstrated a positive attitude about the team and its work</li>
</ol>
<p>This evaluation is NOT based directly on the scores that you receive in the feedback, but a satisfactory peer and self-evaluation is assessed based on the level of constructiveness of the feedback you provide. More detailed, constructive feedback is more useful to help your peers better understand their strengths and areas for growth. Doing so respectfully and compassionately is a requirement. Your peers will receive anonymized versions of the feedback that you share.</p>
</div>
<div id="evaluation" class="container sec">
<h2>Submission, Evaluation, & Grading</h2>
<p>You should submit each deliverable from your project through Gradescope. You will submit a link to each team deliverable. This should be submitted AS A TEAM not through individual submissions. The project proposal, and draft final report should be submitted through GradeScope as links to Google Docs (so that we can attach easy-to-repond-to comments) using the templates provided. The video and github repo should also both be submitted as links via GradeScope. The final project report, however, should be submitted as a PDF document in GradeScope.</p>
<p>The grading for this project will be assigned as follows:</p>
<table class="table table-hover">
<thead>
<tr>
<th scope="col">Component</th>
<th scope="col">Evaluation / Feedback Plan</th>
</tr>
</thead>
<tbody>
<tr>
<th scope="row">Presentation</th>
<td>5 points, graded</td>
</tr>
<tr>
<th scope="row">Final Report</th>
<td>20 points, graded</td>
</tr>
<tr>
<th scope="row">Team Proposal</th>
<td>Written feedback will be provided to help guide your project design.<sup>**</sup></td>
</tr>
<tr>
<th scope="row">Draft Final Report</th>
<td>Written feedback will be provided to help guide your final report writing.<sup>**</sup></td>
</td>
</tr>
<tr>
<th scope="row">Github Repository</th>
<td>Required for project submission to be considered complete.<sup>**</sup></td>
</tr>
<tr>
<th scope="row">Peer Evaluation</th>
<td>Required for project submission to be considered complete.<sup>**</sup></td>
</tr>
<tr>
<th scope="row">Total</th>
<td><b>25 points</b></td>
<td></td>
</tr>
</tbody>
</table>
<p><sup>**</sup> No points will be directly assigned. One point will be deducted from your overall final project score for each day late; up to 2 points may be deducted from the overall project score if the deliverable is unsatisfactory (if it does not represent a serious effort towards the deliverable)</p>
</div>
<div id="ideas" class="container sec">
<!-- <h2>Sample project ideas</h2>
<p><b>Example Project Idea #1: How well buildings be detected in satellite imagery across diverse geographies?</b> Satellite imagery is enabling us to create functional maps of the world based on the content in the images. Automating building identification could help map global population and analyze global population growth in real-time. However, different parts of the world look different: forests, deserts, plains, etc. Each location looks differently. This may impact the ability to train an algorithm on one location and test on another location. This project uses the INRIA building dataset to investigate the impact of different geographies on the performance of building detection and segmentation techniques using satellite imagery.</p>
<p><b>Example Project Idea #2: How well buildings be detected in satellite imagery across diverse geographies?</b> Satellite imagery is enabling us to create functional maps of the world based on the content in the images. Automating building identification could help map global population and analyze global population growth in real-time. However, different parts of the world look different: forests, deserts, plains, etc. Each location looks differently. This may impact the ability to train an algorithm on one location and test on another location. This project uses the INRIA building dataset to investigate the impact of different geographies on the performance of building detection and segmentation techniques using satellite imagery.</p> -->
<h2>Ideas</h2>
<ul>
<li><b>Participate in an active machine learning competition.</b> Online machine learning competitions are sponsored by organizations with a significantly high interest in a problem that they are investing prize money into finding a solution. Examples of competition platforms include <a href="https://www.kaggle.com/competitions">Kaggle</a>, <a href="https://www.drivendata.org/competitions/">Driven Data</a>, <a href="https://zindi.africa/competitions">Zindi</a>, <a href="https://www.aicrowd.com/">AICrowd</a>, etc. If you choose to participate in a competition, it must be an active competition where your team can compete; it cannot be a "sample" competition that is only for learning to use the platform (e.g. the Kaggle Titanic competition, etc.). You will want to learn about the application domain.</li>
<li><b>Design your own project based on a question, e.g. how well buildings be detected in satellite imagery across diverse geographies?</b> Satellite imagery is enabling us to create functional maps of the world based on the content in the images. Automating building identification could help map global population and analyze global population growth in real-time. However, different parts of the world look different: forests, deserts, plains, etc. Each location looks differently. This may impact the ability to train an algorithm on one location and test on another location. This project uses the INRIA building dataset to investigate the impact of different geographies on the performance of building detection and segmentation techniques using satellite imagery.</li>
<li><b>Reproduce the work of a published study and build on it.</b> Reproducing the results of a journal article can be a great way to dive into advanced materials. The goal for a project like this would be to reproduce the study and build on it in some way: test a new hypothesis, adjust the methodology, try it on other data that may present new and interesting challenges. Reproducing papers can be hard, so you'll want to choose wisely and make clear what your innovation is.</li>
<li><b>Build your own tool.</b> Great value can come from making a tool available for use, but building the infrastructure is a challenge. You may want to create a chatbot that creates poetry based on themes that you feed in, or design a search tool that scans satellite data of the Earth for signs of natural disasters. The key here is that your tool will need to be functional and usable by your target audience.</li>
</ul>
<!-- <p>As you're developing ideas for your project, explore active competitions on <a href="https://www.aicrowd.com/">AICrowd</a>, <a href="https://zindi.africa/competitions">Zindi</a>, <a href="https://www.kaggle.com/competitions">Kaggle</a>, <a href="https://www.drivendata.org/competitions/">DrivenData</a>, and other machine learning competition pages. You can use these competitions as a starting point for a project. Additionally, you may want to be inspired by projects in the community, for example, the <a href="https://www.itu.int/en/ITU-T/AI/Pages/ai-repository.aspx">AI for Good repository</a> has a number of projects from which to draw inspiration.</p>
<p><b>What makes for an interesting dataset to explore?</b> The dataset generally needs to have enough samples, features, and labels to enable a meaningful analysis. This rules out options like the Iris, Titanic, and all other "introductory" datasets for which you can find dozens of numerous tutorials walking through the analysis. You want to be able to journey into the unknown of the data: be bold and pick a dataset and application that excites you!</p>
<p><b>Potential sources for datasets:</b></p>
<ul>
<li><a href="https://www.datasetlist.com/">Machine learning datasets</a> <b>(start your search here)</b></li>
<li><a href="https://registry.opendata.aws/">Amazon AWS Open Datasets</a></li>
<li><a href="https://toolbox.google.com/datasetsearch">Google Dataset Search</a></li>
<li><a href=" https://msropendata.com/">Microsoft Research Open Data</a></li>
<li><a href="https://github.com/awesomedata/awesome-public-datasets">Awesome public datasets</a></li>
<li><a href="https://github.com/openimages/dataset">Google's Open Images Dataset</a></li>
<li><a href="https://research.google.com/youtube8m/">Youtube labeled video dataset</a></li>
<li><a href="http://metamind.io/research/the-wikitext-long-term-dependency-language-modeling-dataset/">Wikipedia Text Dataset</a></li>
<li><a href=" https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research">Wikipedia list of machine learning datasets</a></li>
<li><a href="https://www.kaggle.com/datasets">Kaggle Datasets</a></li>
<li><a href=" https://research.google.com/youtube8m/">Youtube labeled video dataset</a></li>
<li><a href="https://github.com/chrieke/awesome-satellite-imagery-datasets"></a>Satellite Imagery Datasets</li>
</ul> -->
</div>
<div id="faq" class="container sec">
<!-- <h2>Frequently Asked Questions</h2>
<h3>Does our project application need to be novel?</h3>
<p>No. While novel ideas are certainly welcome and encouraged, your project does not need to something that has never been done before. In fact, reproducing a past research paper (from a reputable journal) or exceptional projects can be an excellent way to develop your skills and learn good experimental practices along the way. However, you should not simply take an existing repository, hit "run" and call that your project, of course, you will need to make it your own - ask some additional questions, try to modify the methods, etc.</p> -->
</div>
<div class="sechighlight">
<div id="footer">
<div id="footermsg">Website design inspired by the <a href="http://cs231n.stanford.edu/">Stanford CS231 course page</a></div>
</div>
</div>
<!-- jQuery and Bootstrap -->
<script src="https://code.jquery.com/jquery-3.2.1.slim.min.js" integrity="sha384-KJ3o2DKtIkvYIK3UENzmM7KCkRr/rE9/Qpg6aAZGJwFDMVNA/GpGFF93hXpG5KkN" crossorigin="anonymous"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.12.9/umd/popper.min.js" integrity="sha384-ApNbgh9B+Y1QKtv3Rn7W3mgPxhU9K/ScQsAP7hUibX39j7fakFPskvXusvfa0b4Q" crossorigin="anonymous"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/js/bootstrap.min.js" integrity="sha384-JZR6Spejh4U02d8jOt6vLEHfe/JQGiRRSQQxSfFWpi1MquVdAyjUar5+76PVCmYl" crossorigin="anonymous"></script>
</body>
</html>