Skip to content

Latest commit

 

History

History
1075 lines (655 loc) · 21.9 KB

content.md

File metadata and controls

1075 lines (655 loc) · 21.9 KB

Geophysical research powered by open-source

www.leouieda.com @leouieda

Christian-Albrechts-Universität zu Kiel, Germany

1 July 2020

CC-BY 4.0 | Feel free to share/photograph this presentation


  1. Why software best practices are important
  2. How investing in software can benefit science
  3. What you can do about it today

My Background

A tale of three projects


BSc in Geophysics from Universidade de São Paulo | MSc + PhD in Geophysics from Observatório Nacional in Rio


Many years of Python coding later and this code actually compiled on my first try!


Brief stint as a paleomagnetist getting stung by hornets. Don't judge the hair, I was 19.


Project #1

C command-line programs for gravity modelling

tesseroids.leouieda.com


Support for future GOCE data


Collaboration between
Naomi Ussami (USP),
Carla Braitenberg (Trieste),
and Valéria Barbosa (ON).

Uieda, Barbosa, Braitenberg (2016) | doi:10.1190/geo2015-0204.1


Project #2

Libraries for modelling, inversion, data processing, etc.

www.fatiando.org


Started in 2010 as a mixed bag of geophysics in Python.
First website and example gallery from 2011. (Google+ 😂)


Used extensively in teaching at UERJ and research at the PINGA Lab


In 2018 started a complete rewrite of Fatiando a Terra,
breaking into separate tools.


Breath of fresh air

  • PhD student from Argentina
  • Collaborating since 2015
  • Inspired writing down my process
  • Leading some of our new packages (Harmonica and RockHound)
  • Main force behind many new developments

Postdoct at University of Hawai'i working on the Generic Mapping Tools (GMT)


Project #3

Command-line tool for mapping/processing geophysical data

www.generic-mapping-tools.org


PyGMT: Bringing GMT to Python

import pygmt

# Load built-in topography data
grid = pygmt.datasets.load_earth_relief()

fig = pygmt.Figure()
# Pseudo-color map of topography
fig.basemap(
    region=[-150, -30, -60, 60],
    projection="I-90/6i",
    frame=True,
)
fig.grdimage(grid=grid, cmap="viridis")
# Mask continents in dark grey
fig.coast(land="#333333")
# Display in Jupyter or pop-up window
fig.show()

My initial role in Hawai'i was creating PyGMT.


The first official release of PyGMT was managed by Wei Ji and Dongdong.

A community developed project

Contributors to v0.1.0:

  • Dongdong Tian
  • Wei Ji Leong
  • Leonardo Uieda
  • Liam Toney
  • Brook Tozer
  • Claudio Satriano
  • Cody Woodson
  • Mark Wieczorek
  • Philipp Loose
  • Kathryn Materna

GMT started in the 80s by Paul Wessel and Walter Smith. Photo from the 2019 GMT Summit at Scripps.


How Paul can retire in peace 🏝

  • Lower barriers to contribution
  • Automate as much as possible
  • Nurture a community of users/developers
  • Formalize project governance
  • General house cleaning of the code
  • new NSF grant to fund this 🎉 (ID: 1948602)

    Proposal is public at doi.org/10.6084/m9.figshare.12235727


In 2019, started as Lecturer of Geophysics at the University of Liverpool


Geophysics + Open-source

Building methods and software foundations to power

scalable gravity and magnetics processing and inversion


Code is essential to research

Data processing, analysis, visualization, inference, etc.


Computers are always involved somehow.

Machine learning is
open-source:

Image by Victor Grigas (CC-BY-SA)


Why best practices
are important

Horror stories of public embarassment and backlash


Published in The Conversation (CC-BY-ND)


“The most serious was that, in their Excel spreadsheet, Reinhart and Rogoff had not selected the entire row when averaging growth figures...”

Published in The Conversation (CC-BY-ND). Emphasis are my own.


"So the key conclusion of a seminal paper, which has been widely quoted in political debates in North America, Europe, Australia and elsewhere, was invalid."

Published in The Conversation (CC-BY-ND). Emphasis are my own.



"When Ferguson tweeted on 22 March that he "wrote the code (thousands of lines of undocumented C) 13+ years ago to model flu pandemics", the debate expanded to include the work's age, robustness and applicability to coronavirus.

Published in Software Sustainability Institute blog. Emphasis are my own.


Chawla (2020) | doi:10.1038/d41586-020-01685-y


"Influential model judged reproducible
although software engineers called its code
'horrible' and 'a buggy mess'."

Chawla (2020) | doi:10.1038/d41586-020-01685-y. Emphasis are my own.


Earth Science is also in the public gaze

Today's quick hack can become the foundation for

tomorrow's climate change policy.


Not all is grim

  • Good software gets used
  • Used software generates citations and collaborations
  • Potential impact of software is huge
  • It's not too late to start

Paul's Google Scholar page tracks over 18000 citations related to GMT.


Open software
benefits science

Success stories of past, present, and future


Past


Bouman et al. (2016) | doi:10.1038/srep21050


"The signal has been calculated for the spherical geometry with the software Tesseroids"

Bouman et al. (2016) | doi:10.1038/srep21050


Code (built on Fatiando a Terra) and data published on GitHub. Uieda & Barbosa (2017) | doi:10.1093/gji/ggw390.


Studies using the code:
Antarctica (Chisenga et al., 2019; Pappa et al., 2019)
Egypt (Sobh et al., 2019)
Atlas (Ghomsi et al., 2019)
China (Chisenga and Yan, 2019)
Cameroon (Ghomsi et al., 2020)

Code (built on Fatiando a Terra) and data published on GitHub. Uieda & Barbosa (2017) | doi:10.1093/gji/ggw390.


Present


Equivalent source processing

Linear model used to make predictions:

  • interpolation/gridding
  • reduction-to-the-pole
  • upward-continuation
  • derivatives
  • and more

Soler & Uieda (2020) | doi:10.5194/egusphere-egu2020-549.


Challenge: Computationally heavy

Block-averaging source positions can reduce number of sources by 1/2 to 1/5 with same interpolation accuracy.

Soler & Uieda (2020) | doi:10.5194/egusphere-egu2020-549.


Challenge: Source depth, damping, etc

Cross-validation is the gold standard in machine learning.

Underestimates accuracy scores for spatial data.

Block (spatial) cross-validation resolves this issue.

Roberts et al. (2017) | doi:10.1111/ecog.02881
Uieda & Soler (2020) | doi:10.5194/egusphere-egu2020-15729.


Beyond gravity and magnetics

Interpolate and merge 3D velocities from GPS and InSAR

Used to calculate strain rate for tectonics and geohazards

Large data volumes from InSAR

Uieda & et al. (2018) | doi:10.6084/m9.figshare.7440683.


Future


Automated, parallel, scalable

Equivalent-sources on
large data:

  • Parallel processing (done in part)
  • Reduce memory usage (done in part)
  • Efficient machine learning methods
  • Multiple different datasets
  • Scale in the cloud (Pangeo)

Built on open-source:


What you
can do now

(with limited time and money)


Value software work

  1. In your own lab/department/university
  2. In grant evaluations, job searches, awards
  3. Be kind and respectful to developers
  4. Cite the software you use*
  5. Encourage others to do the same

* See "Software citation principles" by Smith et al. (2016)


Get credit for your code

Open-access (free) developer-friendly journal

joss.theoj.org

I'm a topic editor for geophysics and there are many others | JOSS logo licensed CC-BY-4.0


Training


Connect

Find your peers. Join online communities.

"The place for scientists that like rocks and computers"

softwareunderground.org


Contribute to a project

  • Not just about code
  • Documentation and reporting bugs
  • Join the conversation (answer questions, etc)
  • Look projects with Contributing Guides
  • Best way to learn software development

Conclusions


Main takeaways

  1. Treat code as you would data
    be skeptical, diligent, careful

  2. Learn "good-enough" practices
    to safely handle code

  3. One step at a time
    do what can be done right now

  4. Value good software
    with credit, funding, and time


Contact

These slides (including links to everything) are available on my website


This presentation is licensed under

Creative Commons Attribution 4.0 International