## Postdoc positions on AAS Job Register (filtered and sorted by deadline)

This notebook is designed to extract and filter job listings from the AAS job register based on application deadlines.

1. **The `AASJobTable` Class**:
    - This class is responsible for extracting and processing job listing tables from the AAS job register.
    - Methods include:
      - `__init__(self, link)`: Initialize with the job register link.
      - `process_table(self, table)`: Process the HTML table to extract data and convert it into a pandas DataFrame.
      - `retrieve_table(self)`: Retrieve the HTML table from the provided link.
      - `filter_by_date(self, min_date, max_date, date_kw)`: Filter the DataFrame based on the provided date range.

2. **Filtering**:
    - Define the number of days from today for the minimum and maximum deadline filters (`Nmin` and `Nmax`).

This workflow allows users to efficiently extract and filter job listings from the AAS job register based on specific application deadlines.

In [None]:
import pandas as pd
from bs4 import BeautifulSoup
from datetime import datetime, timedelta
from urllib.request import Request, urlopen
from urllib.parse import urljoin
from IPython.display import display, HTML

# Helper function to get the date N days from now
def N_days_from_now(N:int):
    today = datetime.today().strftime('%Y-%m-%d')
    # Convert the formatted string back to a datetime object
    date_object = datetime.strptime(today, '%Y-%m-%d')
    # Add two days using timedelta
    new_date = date_object + timedelta(days=N)
    # Convert the result back to the specified format
    return new_date.strftime('%Y-%m-%d')


# Class to extract tables from AAS job listings
class AASJobTable:
    def __init__(self,link:str):
        self.link = link
        self.headers = {'User-Agent': 'Mozilla/5.0'}

    def process_table(self,table):
        # Extract headers if available
        headers = []
        for th in table.find_all('th'):
            headers.append(th.get_text(strip=True))     

        data = []

        for row in table.find_all('tr')[1:]:  # Skip the first row if it contains headers
            cells = row.find_all(['td', 'th'])  # Include 'th' if headers are within rows
            if not cells:
                continue  # Skip empty rows
            row_data = []
            for cell in cells:
                # Check if the cell contains a link
                link_tag = cell.find('a', href=True)
                if link_tag:
                    # Extract the link and convert it to an absolute URL
                    link = urljoin(self.link, link_tag['href'])
                    # Optionally, extract the link text
                    link_text = link_tag.get_text(strip=True)
                    # Store the link or both link and text
                    row_data.append(link_text)   # Append link text
                    row_data.append(link)        # Append link URL
                else:
                    # If there's no link, extract the cell text
                    text = cell.get_text(strip=True)
                    row_data.append(text)
                    row_data.append(None)  # No link here
            data.append(row_data)

        column_names = []
        for header in headers:
            column_names.append(f"{header}")
            column_names.append(f"{header}_link")

        # Create the DataFrame
        df = pd.DataFrame(data, columns=column_names)
        df = df.dropna(axis=1, how='all')
        for header in headers:
            if(f"{header}_link" in df.columns):
                df[f"{header}_link"] = df[f"{header}_link"].apply(lambda x: f'<a href="{x}" target="_blank">{x}</a>')
        return df

    def retrieve_table(self):
        req = Request(self.link, headers=self.headers)
        webpage = urlopen(req).read()
        
        soup = BeautifulSoup(webpage, "html.parser")
        table = soup.find("table")
        df = self.process_table(table)

        self.dataframe = df.copy(deep=True)
            
    def filter_by_date(self, min_date:str, max_date:str='2100-01-01', date_kw:str='Deadline'):
        self.dataframe[date_kw] = pd.to_datetime(self.dataframe[date_kw],format='%Y/%m/%d')
        
        mask = (self.dataframe[date_kw] > min_date) & (self.dataframe[date_kw] <= max_date)
        self.dataframe = self.dataframe.loc[mask]
        self.dataframe = self.dataframe.sort_values(by=date_kw)

In [None]:
# Number of days from today (for the deadline filter)
Nmin = 9
Nmax = 40

# Link to the AAS job register
link = 'https://aas.org/jobregister?f%5B0%5D=category%3A511'

In [3]:
# Get dates for filtering
mindate = N_days_from_now(Nmin)
maxdate = N_days_from_now(Nmax)

In [4]:
# Create the table object
T = AASJobTable(link)
# Retrieve the table
T.retrieve_table()
# Filter by date
T.filter_by_date(mindate,maxdate)

***

In [5]:
print(f'Jobs with deadlines between {mindate} and {maxdate}')

# Display the DataFrame as HTML in Jupyter Notebook
display(HTML(T.dataframe.to_html(escape=False)))

Jobs with deadlines between 2024-12-19 and 2025-01-19


Unnamed: 0,Title,Title_link,Institution,Location(s),Posted,Deadline
126,"Astrophysicist, (Postdoctoral Research Fellow)",https://aas.org/jobregister/ad/5b4e1206,Center for Astrophysics | Harvard & Smithsonian,"Cambridge,MAUnited States",2024/10/24,2024-12-20
130,Postdoctoral Appointment in Exoplanetary Science,https://aas.org/jobregister/ad/c665aa68,Massachusetts Institute of Techology,"Cambridge,MAUnited States",2024/10/22,2024-12-20
124,MCSS Postdoctoral Research Fellowship,https://aas.org/jobregister/ad/c19e392a,Washington University in St. Louis,"St. Louis,MOUnited States",2024/10/24,2024-12-20
51,Postdoctoral Position - High-resolution spectroscopic characterization of hot Jupiter atmospheres,https://aas.org/jobregister/ad/f2058039,"University of Michigan, Ann Arbor",,2024/11/25,2024-12-20
50,NCN-funded post-doc positions (OPUS) in stellar astrophysics (1+1 year),https://aas.org/jobregister/ad/533f2330,"Nicolaus Copernicus University in Torun, Poland",,2024/11/25,2024-12-20
69,Postdoctoral position in stellar populations,https://aas.org/jobregister/ad/33c86b3b,Masaryk University,BrnoCzechia,2024/11/19,2024-12-20
72,Postdoctoral fellow in Astrophysics,https://aas.org/jobregister/ad/82189cbb,"Lund University, Sweden",,2024/11/18,2024-12-20
77,Postdoctoral Research Associates in Local Volume Galaxy Archaeology,https://aas.org/jobregister/ad/34fdafd4,University of Edinburgh,EdinburghUnited Kingdom,2024/11/13,2024-12-20
79,Postdoctoral Research Associate in Planet Formation,https://aas.org/jobregister/ad/4b6bfc61,Iowa State University,"Ames,IAUnited States",2024/11/13,2024-12-20
63,Postdoctoral Research Associate in Galaxy Formation,https://aas.org/jobregister/ad/cff8aad3,University of Edinburgh,EdinburghUnited Kingdom,2024/11/20,2024-12-20
