##Script for listing all GLOBE schools with more than 10 000 rows of data

---

I made this script because I wanted to make a better list of GLOBE schools with many measurements. At first, I tried to look at official metrics (https://www.globe.gov/about/impact-and-metrics/schools-with-many-measurements). Unfortunately, the number of data there was different from the number on official schools pages.

I asked for an explanation on the GLOBE forum, and I got the following response:

> "Different areas of the website count measurements differently.  Some places count the number of rows of data across several database tables.  Other places count specific table cells in select tables.  Depending on the protocol there are occasions where just specific samples are counted.  The https://www.globe.gov/about/impact-and-metrics/schools-with-many-measurements advertises "Approximate number of measurements" so it is not meant to be exact counts."

So, in the end, I wrote this simple script that can give you a list of all schools with >= 10 000 data rows ordered by the number of data on their main page.

Important note: Citizen science national accounts are not included in this ranking.

In [2]:
from bs4 import BeautifulSoup
import requests
import sys

In [3]:
url_list = ["https://www.globe.gov/about/impact-and-metrics/schools-with-many-measurements/100000-499999",
            "https://www.globe.gov/about/impact-and-metrics/schools-with-many-measurements/50000-99999",
            "https://www.globe.gov/about/impact-and-metrics/schools-with-many-measurements/10000-49999"]

In [4]:
def get_list_of_globe_schools():

    list_of_globe_schools=[]

    for link in url_list:
        page = requests.get(link)    
        data = page.text
        soup = BeautifulSoup(data)

        for table in soup.find_all('table'):
            for link in table.find_all('a'):
                list_of_globe_schools.append(link.get('href'))
    
    return list_of_globe_schools

In [10]:
def get_number_of_data(list_of_globe_schools):
    final_data=[]
    for link in list_of_globe_schools:
      try:
            page = requests.get(link)
            name=link[26:]
            data = page.text
            soup = BeautifulSoup(data)
            statistics=soup.find_all("div", class_="participation-count-column")[1]
            number_of_observations = statistics.text.strip().split("\n")
            final_data.append((name,int(number_of_observations[0])))
      except IndexError as e:
         print (link,e)   
    
    final_data.sort(key=lambda x: x[1],reverse=True)

    return final_data

In [6]:
list_of_globe_schools=get_list_of_globe_schools()

In [11]:
final_data=get_number_of_data(list_of_globe_schools)

https://www.globe.gov/web/united-states-of-america-citizen-science list index out of range
https://www.globe.gov/web/canada-citizen-science list index out of range
https://www.globe.gov/web/india-citizen-science list index out of range
https://www.globe.gov/web/thailand-citizen-science list index out of range


In [12]:
for i in final_data:
  print(i[0],i[1])

dsmweather 39872122
mittelschule-elsterberg 30705310
earth-networks-globe-v-school 20552483
globe-one-automated-weather-stations 5375526
university-of-cologne-glidt256- 4836481
norfork-elementary-school 4699369
sumarska-i-drvodjeljska-skola 4517244
missouri-globe-v-school 2288004
hills-home-school 2160953
edmund-burke-school 1916759
gimnazium-in-toszek 1672460
southern-connecticut-state-university-usctnilj- 1613664
university-of-toledo-school 1592436
ramey-school 1579013
ncar-foothills-lab 1523044
littleton-middle-school 1451041
palmyra-cove-nature-park-pcnp- 1400702
virginia-museum-of-natural-history1 1349268
o.j.roberts-middle-school 1339673
trinity-school 1328912
273129 1306145
mcknight-middle-school 1286941
mahopac-high-school 1156424
stone-child-college-usmtgcz3- 1155821
north-shore-hebrew-academy-high-school 1135425
238824 1125782
lourdes-public-charter-school 1113176
united-states-of-america-globe-v-school 1042375
312849 1009488
ruth-cherry-intermediate-school 998149
northland-p