Christian-Albrechts-Universität zu Kiel, Germany
1 July 2020
CC-BY 4.0 | Feel free to share/photograph this presentation
- Why software best practices are important
- How investing in software can benefit science
- What you can do about it today
BSc in Geophysics from Universidade de São Paulo | MSc + PhD in Geophysics from Observatório Nacional in Rio
Many years of Python coding later and this code actually compiled on my first try!
Brief stint as a paleomagnetist getting stung by hornets. Don't judge the hair, I was 19.
Collaboration between
Naomi Ussami (USP),
Carla Braitenberg (Trieste),
and Valéria Barbosa (ON).
Uieda, Barbosa, Braitenberg (2016) | doi:10.1190/geo2015-0204.1
Started in 2010 as a mixed bag of geophysics in Python.
First website and example gallery from 2011. (Google+ 😂)
In 2018 started a complete rewrite of Fatiando a Terra,
breaking into separate tools.
Postdoct at University of Hawai'i working on the Generic Mapping Tools (GMT)
import pygmt
# Load built-in topography data
grid = pygmt.datasets.load_earth_relief()
fig = pygmt.Figure()
# Pseudo-color map of topography
fig.basemap(
region=[-150, -30, -60, 60],
projection="I-90/6i",
frame=True,
)
fig.grdimage(grid=grid, cmap="viridis")
# Mask continents in dark grey
fig.coast(land="#333333")
# Display in Jupyter or pop-up window
fig.show()
My initial role in Hawai'i was creating PyGMT.
The first official release of PyGMT was managed by Wei Ji and Dongdong.
GMT started in the 80s by Paul Wessel and Walter Smith. Photo from the 2019 GMT Summit at Scripps.
- Lower barriers to contribution
- Automate as much as possible
- Nurture a community of users/developers
- Formalize project governance
- General house cleaning of the code
-
new NSF grant to fund this
🎉
(ID: 1948602)
Proposal is public at doi.org/10.6084/m9.figshare.12235727
In 2019, started as Lecturer of Geophysics at the University of Liverpool
Data processing, analysis, visualization, inference, etc.
Computers are always involved somehow.
Machine learning is
open-source:
- scikit-learn
- TensorFlow (Google)
- PyTorch (Facebook)
- RAPIDS (NVIDIA)
Image by Victor Grigas (CC-BY-SA)
Published in The Conversation (CC-BY-ND)
“The most serious was that, in their Excel spreadsheet, Reinhart and Rogoff had not selected the entire row when averaging growth figures...”
Published in The Conversation (CC-BY-ND). Emphasis are my own.
"So the key conclusion of a seminal paper, which has been widely quoted in political debates in North America, Europe, Australia and elsewhere, was invalid."
Published in The Conversation (CC-BY-ND). Emphasis are my own.
Published in Software Sustainability Institute blog
"When Ferguson tweeted on 22 March that he "wrote the code (thousands of lines of undocumented C) 13+ years ago to model flu pandemics", the debate expanded to include the work's age, robustness and applicability to coronavirus.
Published in Software Sustainability Institute blog. Emphasis are my own.
Chawla (2020) | doi:10.1038/d41586-020-01685-y
"Influential model judged reproducible —
although software engineers called its code
'horrible' and 'a buggy mess'."
Chawla (2020) | doi:10.1038/d41586-020-01685-y. Emphasis are my own.
Paul's Google Scholar page tracks over 18000 citations related to GMT.
Bouman et al. (2016) | doi:10.1038/srep21050
"The signal has been calculated for the spherical geometry with the software Tesseroids"
Bouman et al. (2016) | doi:10.1038/srep21050
Code (built on Fatiando a Terra) and data published on GitHub. Uieda & Barbosa (2017) | doi:10.1093/gji/ggw390.
Studies using the code:
Antarctica (Chisenga et al., 2019; Pappa et al., 2019)
Egypt (Sobh et al., 2019)
Atlas (Ghomsi et al., 2019)
China (Chisenga and Yan, 2019)
Cameroon (Ghomsi et al., 2020)
Code (built on Fatiando a Terra) and data published on GitHub. Uieda & Barbosa (2017) | doi:10.1093/gji/ggw390.
Linear model used to make predictions:
- interpolation/gridding
- reduction-to-the-pole
- upward-continuation
- derivatives
- and more
Soler & Uieda (2020) | doi:10.5194/egusphere-egu2020-549.
Block-averaging source positions can reduce number of sources by 1/2 to 1/5 with same interpolation accuracy.
Soler & Uieda (2020) | doi:10.5194/egusphere-egu2020-549.
Cross-validation is the gold standard in machine learning.
Underestimates accuracy scores for spatial data.
Block (spatial) cross-validation resolves this issue.
Roberts et al. (2017) | doi:10.1111/ecog.02881
Uieda & Soler (2020) | doi:10.5194/egusphere-egu2020-15729.
Uieda & et al. (2018) | doi:10.6084/m9.figshare.7440683.
Equivalent-sources on
large data:
- Parallel processing (done in part)
- Reduce memory usage (done in part)
- Efficient machine learning methods
- Multiple different datasets
- Scale in the cloud (Pangeo)
- In your own lab/department/university
- In grant evaluations, job searches, awards
- Be kind and respectful to developers
- Cite the software you use*
- Encourage others to do the same
* See "Software citation principles" by Smith et al. (2016)
Open-access (free) developer-friendly journal
I'm a topic editor for geophysics and there are many others | JOSS logo licensed CC-BY-4.0
- Software Carpentry
- AGU Workshops
- Software Sustainability Institute (SSI)
- US Research Software Sustainability Institute (URSSI)
- Better Scientific Software
Find your peers. Join online communities.
- Not just about code
- Documentation and reporting bugs
- Join the conversation (answer questions, etc)
- Look projects with Contributing Guides
- Best way to learn software development
-
Treat code as you would data
be skeptical, diligent, careful -
Learn "good-enough" practices
to safely handle code -
One step at a time
do what can be done right now -
Value good software
with credit, funding, and time
These slides (including links to everything) are available on my website