Skip to content

GSoC 2023 Aritra Sinha

Nabil Freij edited this page Feb 22, 2024 · 2 revisions

Improving the Solar Feature Representation for Heliophysics Event Knowledgebase (HEK)

About Me

I am a final year Electronics and Communication Engineering student from the National Institute of Technology Karnataka, Surathkal, India. I am a programming enthusiast with a profound interest in Backend Development and Machine Learning. I am passionate about turning my ideas into code and building new things. Apart from programming, I like reading crime thrillers in my free time.

General Information:

Project Description and Proposed Solution

Project Overview

Sunpy provides access to solar feature and event data held by the Heliophysics Event Knowledgebase (HEK). There are many properties of features and events. Some of them could be specific to that event or feature, while some of them are global properties that are common for all the features and events, and many of them help to characterise them, categorise them, and distinguish them from each other. A few examples of global attributes are Event_Type (type of the event), Event_Coordsys (Coordinate System type followed for the event), Event_CoordUnit (Unit of coordinates), Event_StartTime (the time when the event starts), and Event_EndTime (the time when the event ends). On the other hand, properties like CME_RadialLinVel (Radial Linear fit radial velocity, Relevant to Coronal Mass Ejections or CME only), AR_SpotAreaRaw (Area of Active Regions (ARs) in the plane of the sky) are event/feature-specific properties. A detailed list and specifications of all the feature and event properties are mentioned here.

Problem Description

The current implementation and representation of HEK features and events involves the following classes, modules, or files:

  • sunpy.net.hek.hek.HEKClient: This class (a subclass of sunpy.net.base_client.BaseClient) is used to communicate with the HEK(HER API, more specifically) to fetch solar data matching the parameters and criteria mentioned by the user.
  • sunpy.net.hek.hek.HEKTable: A container (subclass of astropy.table) that helps us handle data returned from HEK after making a query.
  • sunpy.net.hek.hek.HEKRow: This is a subclass of astropy.table.Row that helps us handle the response from HEK. The column-row key-value pairs correspond to the feature/event properties and their values. Other than that, it also has other properties to relate HEK to VSO (Virtual Solar Observatory) concepts.
  • sunpy/tools/hektemplate.py: Due to a large number of properties and the high complexity with which they are associated with different features and events present in the HEK, it becomes really hard to implement different classes and associate them with the correct attributes. So this whole process has been automated by writing a code-generation python script to generate all the classes for all the events and features. This file provides the base classes for Events and Features and wrapper classes for the properties so that operator overloading could be used to define a lot of dunder/magic classes for comparisons while making queries to the API.
  • sunpy/tools/hek_mkcls.py: This script uses a manual dictionary with key-value mapping between different properties and attributes to generate the source code for the classes representing different features, event, and their properties and write them into a target file (we use sunpy/net/hek/attrs.py as the target file).

The following implementation has quite a few problems that need to be improved to make things simpler, automated, and clear to understand. The issues are:

Astropy Units not used while parsing to Tables

The method HEKClient.search function uses the HEKClient.download function to fetch the data using the HEK API and then stores it in an astropy table (astropy.table.QTable). For many of the numerical (float, integer, or long type) properties like PeakPower (Peak power of oscillation), Outflow_Speed (Outflow Speed of Outflow), and many more, even though we have enough information regarding their units available to us in the form of another property (PeakPowerUnit for PeakPower and Outflow_SpeedUnit for Outflow_Speed). Even though columns in astropy tables support units to be added, the units are stored as separate columns, and this makes a lot of data to be redundant. There is a need for them to be integrated as astropy units into their relevant columns.

The idea is to remove the columns which signify the unit of any other property from the astropy tables created and use the information they provide to assign a unit to the columns they are referring to. This would help us significantly reduce the data to be stored and its redundancy, too, making things simpler.

Increase in complexity due to usage of the same wrapper function for all types of attributes.

Making queries using the HEKClient.search method provides us with a lot of options and conditions other than just an Equal to (Greater than, Less than, And, Or, Like, etc.). All of these are made possible by creating different classes for different Features, Events, and Properties assigned to them, and some wrapper classes, too so that the magic/dunder methods can be used for operator overloading and defining different operations for them. One of the important points to notice here is that all the properties for various Events or Features are objects of a wrapper function _StringParamAttrWrapper irrespective of their actual data type.

The idea here is to have separate wrapper classes for different data types, which will help us simplify the implementation of their operator functions and incorporate the associated units. This will enable us to remove all the properties that just provide the information regarding units of other properties, thus significantly reducing the number of properties and attribute objects, making things easier for us to maintain.

Proposed Solutions

Integrating Astropy Units to the results

The flow of control while fetching any data using the HEK API and then parsing and converting it into an HEKTable instance (a subclass of sunpy.net.base_client.QueryTableResponse which in turn is derived from astropy.table.table.QTable), starts inside the HEKClient.search function where the query to be made using the HEK API is taken as an argument and converted and formatted into a JSON object. It then calls the method HEKClient._download method with the JSON query object as an argument. Inside the _download function, the JSON query is added as a query string to the HEK URL, where the GET request is made to fetch the data. Once the API call is made and the results are fetched, the whole list of JSON objects is then parsed into an astropy table as different and returned. The unit columns are also passed into the table along with other properties.

I intend to create a JSON file, hek_properties.json, that would hold all the information on the HEK properties and attributes. This includes the description of the property, the data type of the property, and the property telling us about its unit. The JSON file would have elements looking like this:

{
  "obs_meanwavel": {
    "type": "float",
    "desc": "Mean wavelength (preferably in Angstroms)",
    "unit_prop": "obs_wavelunit"
  }
}

The JSON file could now be used to fetch the unit properties for any attribute. Other than that, this file would also serve other purposes, which will be mentioned later. The next step involves the conversion of all those numerical properties into astropy Quantities. This can be broken down into 3 parts:

  1. Conversion into astropy.Time objects
  2. Conversion into astropy.Quantity objects
  3. Conversion into astropy.coordinates.SkyCoord objects

The method called _parse_values_to_quantites would be defined to convert all the numerical values whose unit information is provided. Along with that, we also remove all the unit defining properties from the dictionary. The method will return a dictionary that will have the string attributes the same as before, but the numerical attributes with units defined will be converted to astropy.Quantities and all the unit defining properties will be removed. A utility function called _string_to_units would also be defined to convert a string given in those unit-defining properties to astropy units. For example, for a property fl_peakflux, the unit is defined in fl_peakfluxunit. The unit can be of a format like “erg/cm^2/s” which needs to be converted into astropy units like “u.cm/(u.cm**2 * u.s)”. For coordinates, the default unit to be considered will be degrees (u.deg) until any other unit is specifically mentioned. I also intend to create a method called _parse_coordinates inside HEKClient for parsing any property that is a coordinate and parse it into an astropy.coordinates.SkyCoord object. Since SkyCoord supports storing of multiple coordinates, it will provide us with direct support for storing bounding boxes. The utility function _string_to_units would be used here to convert the strings into astropy quantities for coordinates. The _parse_times function is already implemented, we just need to update it such that it handles a JSON object instead of updating the values directly into the table. Once we do that, parsing all the values into the astropy Table would automatically assign units to the values and store them. The changes in the _download function would now look something like this:

quantified_results = self._parse_values_to_quantites(results)
quantified_results = self._parse_coordinates(quantified_results)
quantified_results = self._parse_times(quantified_results)
table = astropy.table.Table(dict_keys_same(quantified_results))
return table

Creating new wrapper classes for attributes of different types

All the attributes can be compared to other values using arithmetic operators. This was done by operator overloading for the attribute classes they are objects of or the base classes they are derived from (all of them follow multi-level inheritance).

Even if the data type is an integer, float, or a long data type, or a quantity with some unit, we still use the _StringParamAttrWrapper, which gives us no distinction about different properties and their types while looking at them. I intend to add a few more classes that help us provide us with that distinction and integrate any of the astropy units with them so that a no. of validations can be made while creating those wrapper objects to be sent as a query to the API. The classes that I want to create are:

  1. _IntegerParamAttrWrapper: To wrap all integer type properties using this. This will be derived from the class HEKomparisonParamAttrWrapper.
  2. _FloatParamAttrWrapper: To wrap all float type properties using this. This will be derived from the class HEKomparisonParamAttrWrapper.
  3. _LongParamAttrWrapper: To wrap all long data type properties using this. This will be derived from the class HEKomparisonParamAttrWrapper.

The properties will be assigned a wrapper class based on its data type mentioned in hek_properties.json.

One more change I intend to make is to add an attribute for unit to the HEKAttr class. The unit attribute would be an optional attribute (an atrophy Unit object) to be only used for those numerical properties, else it would be none for other properties. This would help us in adding a layer of validation for the queries that the users make and see if the users are making invalid queries in case the units do not match.

Timeline

Weeks/Phases Dates Tasks to be completed
Community Bonding Period May 4 - May 28 - Interact with mentors and members to align my objectives with the deliverables as much as possible and try to get myself more familiar with the codebase by working on more issues and fixing bugs (if any).
Week 1-2 May 29 - June 12 - Create the hek_properties.json file for a homogeneous and formatted mapping between the properties and their details.
- Create the _parse_coordindate method and the _parse_values_to_quantities method to integrate astropy units into the Tables we store them in.
Week 3-4 June 13- June 26 - Create unit test for the newly added functionalities
- Fix the errors/bugs (if any)
Week 5-6 June 27 - July 10 - Create new wrapper classes to distinguish different data types used by different properties.
- Add the validation function to match the units and dimensions that will be triggered before the HEKAttr object is created.
Week 7-8 July 11 - July 24 - Write unit tests for the new wrapper classes and the validation function
- Fix the bugs and errors in the code (if any)
Week 9-10 July 25 - August 7 Work on adding a description to each and every property that the HEK property to make it easier for the users to understand them while using.
Week 11-12 Aug 8 - Aug 14 - Review the code added, and wait for reviews from mentors and community members and make the changes suggested.
- Finish up a blog that would be maintained and constantly updated throughout the course of the program to document every change.

Skills and Experience

My primary areas of interest include Web Development, mainly focusing on Backend development with proficiency in NodeJS, Django, and Ruby on Rails using both SQL and NoSQL databases. I also have some experience with ReactJS. I have been working with Git and GitHub for the last 3 years and have been well-versed in Version Control Systems and their nitty-gritty.

Projects

  • Impresario: An application made using Django and PostgreSQL that helps users keep track of organisations, their sub-organisations, their members, and the events organised under them, making sure there are no clashes in the schedules of any people involved in any of the events. It uses Google API to integrate Google Calendars and send invites to all the attendees.
  • Wordplay- A skribbl.io clone: An online game similar to Skribbl.io developed using NodeJS and MongoDB for the backend, while Vanilla JS for the front end. It uses Web Sockets to allow users to create their own rooms with the name of their choice and for others to join where each user in the room will get a canvas one by one, and they will be given a word they have to draw to make other users guess the word.

Open Source Contribution

  • Sunpy Issue #6457: Worked on the issue to clean up and cover all the GOES files in the tests. The PR for this issue has been merged, and the issue has been closed.
  • Sunpy Issue #6239: Created a utility function that recursively traverses the tree in the XML header of a jp2 file and parsed the comments and the history associated with the files and each element to the relevant dictionary. The PR for this is in review for now.
  • HNN-core Issue #544: This issue deals with adding a limit of the end of simulation time to all the plots so that they all can be aligned while plotting them as subplots in GUI. The PR is in the review stage as of now. ● Checkstyle Issue #7598: This issue deals with adding examples to the documentation on how the checks are made for Javadocs. This PR has been merged, and the issue has been closed.

Why am I a good fit?

  • I see myself as a good team player with excellent communication skills. I have communicated with the Sunpy Team members and mentors of this project on the forum and on GitHub with the best of regards.
  • I understand the importance of testing and documentation. I will add all the necessary tests and maintain a blog explaining each part of the code for contributors extending these features.
  • I see this as the start of a long term association with Sunpy and Open Astronomy, and I’ll continue to contribute in the future.
  • This is my first time applying for GSoC, and I am pretty excited about it. I promise to dedicate myself to this cause and work with all my heart to achieve the objectives of this project and contribute the cleanest and best code that I can achieve. I can easily dedicate around 4-5 hours a day to the GSoC program as I will have my summer vacation during the whole period.
  • I have a lot of experience with NodeJs, Django, and Ruby on Rails. I am proficient in programming languages like Python, C, C++, Ruby, and Javascript. I have also contributed to Open Source by submitting patches to some communities on Github. Thus, it is comparatively easy for me to work on this codebase and analyse it properly.
Clone this wiki locally