GSoC 2023 Aritra Sinha
I am a final year Electronics and Communication Engineering student from the National Institute of Technology Karnataka, Surathkal, India. I am a programming enthusiast with a profound interest in Backend Development and Machine Learning. I am passionate about turning my ideas into code and building new things. Apart from programming, I like reading crime thrillers in my free time.
- Full Name: Aritra Sinha
- University/College: National Institute of Technology Karnataka, Surathkal, India.
- Email: aritrasinha002@gmail.com, aritrasinha.191ec108@nitk.edu.in
- Time Zone: IST (GMT + 5:30)
- Github: https://github.com/aritrasinha108
- LinkedIn: https://www.linkedin.com/in/aritra-sinha-0251101a3/
- Element Profile: @johnyinglis:matrix.org
- Resume: https://drive.google.com/file/d/1SS8mYEqCKrcA8E55QIbQArlQE-gf2zTv/view?usp=sharing
Sunpy provides access to solar feature and event data held by the Heliophysics Event Knowledgebase (HEK). There are many properties of features and events. Some of them could be specific to that event or feature, while some of them are global properties that are common for all the features and events, and many of them help to characterise them, categorise them, and distinguish them from each other. A few examples of global attributes are Event_Type
(type of the event), Event_Coordsys
(Coordinate System type followed for the event), Event_CoordUnit
(Unit of coordinates), Event_StartTime
(the time when the event starts), and Event_EndTime
(the time when the event ends). On the other hand, properties like CME_RadialLinVel
(Radial Linear fit radial velocity, Relevant to Coronal Mass Ejections or CME only), AR_SpotAreaRaw
(Area of Active Regions (ARs) in the plane of the sky) are event/feature-specific properties. A detailed list and specifications of all the feature and event properties are mentioned here.
The current implementation and representation of HEK features and events involves the following classes, modules, or files:
-
sunpy.net.hek.hek.HEKClient
: This class (a subclass ofsunpy.net.base_client.BaseClient
) is used to communicate with the HEK(HER API, more specifically) to fetch solar data matching the parameters and criteria mentioned by the user. -
sunpy.net.hek.hek.HEKTable
: A container (subclass ofastropy.table
) that helps us handle data returned from HEK after making a query. -
sunpy.net.hek.hek.HEKRow
: This is a subclass ofastropy.table.Row
that helps us handle the response from HEK. The column-row key-value pairs correspond to the feature/event properties and their values. Other than that, it also has other properties to relate HEK to VSO (Virtual Solar Observatory) concepts. -
sunpy/tools/hektemplate.py
: Due to a large number of properties and the high complexity with which they are associated with different features and events present in the HEK, it becomes really hard to implement different classes and associate them with the correct attributes. So this whole process has been automated by writing a code-generation python script to generate all the classes for all the events and features. This file provides the base classes for Events and Features and wrapper classes for the properties so that operator overloading could be used to define a lot of dunder/magic classes for comparisons while making queries to the API. -
sunpy/tools/hek_mkcls.py
: This script uses a manual dictionary with key-value mapping between different properties and attributes to generate the source code for the classes representing different features, event, and their properties and write them into a target file (we use sunpy/net/hek/attrs.py as the target file).
The following implementation has quite a few problems that need to be improved to make things simpler, automated, and clear to understand. The issues are:
The method HEKClient.search
function uses the HEKClient.download
function to fetch the data using the HEK API and then stores it in an astropy table (astropy.table.QTable
). For many of the numerical (float, integer, or long type) properties like PeakPower
(Peak power of oscillation), Outflow_Speed
(Outflow Speed of Outflow), and many more, even though we have enough information regarding their units available to us in the form of another property (PeakPowerUnit
for PeakPower
and Outflow_SpeedUnit
for Outflow_Speed
).
Even though columns in astropy tables support units to be added, the units are stored as separate columns, and this makes a lot of data to be redundant. There is a need for them to be integrated as astropy units into their relevant columns.
The idea is to remove the columns which signify the unit of any other property from the astropy tables created and use the information they provide to assign a unit to the columns they are referring to. This would help us significantly reduce the data to be stored and its redundancy, too, making things simpler.
Making queries using the HEKClient.search
method provides us with a lot of options and conditions other than just an Equal to (Greater than, Less than, And, Or, Like, etc.). All of these are made possible by creating different classes for different Features, Events, and Properties assigned to them, and some wrapper classes, too so that the magic/dunder methods can be used for operator overloading and defining different operations for them. One of the important points to notice here is that all the properties for various Events or Features are objects of a wrapper function _StringParamAttrWrapper
irrespective of their actual data type.
The idea here is to have separate wrapper classes for different data types, which will help us simplify the implementation of their operator functions and incorporate the associated units. This will enable us to remove all the properties that just provide the information regarding units of other properties, thus significantly reducing the number of properties and attribute objects, making things easier for us to maintain.
The flow of control while fetching any data using the HEK API and then parsing and converting it into an HEKTable
instance (a subclass of sunpy.net.base_client.QueryTableResponse
which in turn is derived from astropy.table.table.QTable
), starts inside the HEKClient.search
function where the query to be made using the HEK API is taken as an argument and converted and formatted into a JSON object. It then calls the method HEKClient._download
method with the JSON query object as an argument. Inside the _download
function, the JSON query is added as a query string to the HEK URL, where the GET request is made to fetch the data. Once the API call is made and the results are fetched, the whole list of JSON objects is then parsed into an astropy table as different and returned. The unit columns are also passed into the table along with other properties.
I intend to create a JSON file, hek_properties.json
, that would hold all the information on the HEK properties and attributes. This includes the description of the property, the data type of the property, and the property telling us about its unit. The JSON file would have elements looking like this:
{
"obs_meanwavel": {
"type": "float",
"desc": "Mean wavelength (preferably in Angstroms)",
"unit_prop": "obs_wavelunit"
}
}
The JSON file could now be used to fetch the unit properties for any attribute. Other than that, this file would also serve other purposes, which will be mentioned later. The next step involves the conversion of all those numerical properties into astropy Quantities. This can be broken down into 3 parts:
- Conversion into
astropy.Time
objects - Conversion into
astropy.Quantity
objects - Conversion into
astropy.coordinates.SkyCoord
objects
The method called _parse_values_to_quantites would be defined to convert all the numerical values whose unit information is provided. Along with that, we also remove all the unit defining properties from the dictionary. The method will return a dictionary that will have the string attributes the same as before, but the numerical attributes with units defined will be converted to astropy.Quantities and all the unit defining properties will be removed.
A utility function called _string_to_units
would also be defined to convert a string given in those unit-defining properties to astropy units. For example, for a property fl_peakflux
, the unit is defined in fl_peakfluxunit
. The unit can be of a format like “erg/cm^2/s” which needs to be converted into astropy units like “u.cm/(u.cm**2 * u.s
)”. For coordinates, the default unit to be considered will be degrees (u.deg
) until any other unit is specifically mentioned.
I also intend to create a method called _parse_coordinates
inside HEKClient
for parsing any property that is a coordinate and parse it into an astropy.coordinates.SkyCoord
object. Since SkyCoord
supports storing of multiple coordinates, it will provide us with direct support for storing bounding boxes. The utility function _string_to_units
would be used here to convert the strings into astropy quantities for coordinates.
The _parse_times
function is already implemented, we just need to update it such that it handles a JSON object instead of updating the values directly into the table.
Once we do that, parsing all the values into the astropy Table would automatically assign units to the values and store them. The changes in the _download
function would now look something like this:
quantified_results = self._parse_values_to_quantites(results)
quantified_results = self._parse_coordinates(quantified_results)
quantified_results = self._parse_times(quantified_results)
table = astropy.table.Table(dict_keys_same(quantified_results))
return table
All the attributes can be compared to other values using arithmetic operators. This was done by operator overloading for the attribute classes they are objects of or the base classes they are derived from (all of them follow multi-level inheritance).
Even if the data type is an integer, float, or a long data type, or a quantity with some unit, we still use the _StringParamAttrWrapper, which gives us no distinction about different properties and their types while looking at them. I intend to add a few more classes that help us provide us with that distinction and integrate any of the astropy units with them so that a no. of validations can be made while creating those wrapper objects to be sent as a query to the API. The classes that I want to create are:
-
_IntegerParamAttrWrapper
: To wrap all integer type properties using this. This will be derived from the classHEKomparisonParamAttrWrapper
. -
_FloatParamAttrWrapper
: To wrap all float type properties using this. This will be derived from the classHEKomparisonParamAttrWrapper
. -
_LongParamAttrWrapper
: To wrap all long data type properties using this. This will be derived from the classHEKomparisonParamAttrWrapper
.
The properties will be assigned a wrapper class based on its data type mentioned in hek_properties.json
.
One more change I intend to make is to add an attribute for unit to the HEKAttr
class. The unit attribute would be an optional attribute (an atrophy Unit object) to be only used for those numerical properties, else it would be none for other properties. This would help us in adding a layer of validation for the queries that the users make and see if the users are making invalid queries in case the units do not match.
Weeks/Phases | Dates | Tasks to be completed |
---|---|---|
Community Bonding Period | May 4 - May 28 | - Interact with mentors and members to align my objectives with the deliverables as much as possible and try to get myself more familiar with the codebase by working on more issues and fixing bugs (if any). |
Week 1-2 | May 29 - June 12 | - Create the hek_properties.json file for a homogeneous and formatted mapping between the properties and their details. - Create the _parse_coordindate method and the _parse_values_to_quantities method to integrate astropy units into the Tables we store them in. |
Week 3-4 | June 13- June 26 | - Create unit test for the newly added functionalities - Fix the errors/bugs (if any) |
Week 5-6 | June 27 - July 10 | - Create new wrapper classes to distinguish different data types used by different properties. - Add the validation function to match the units and dimensions that will be triggered before the HEKAttr object is created. |
Week 7-8 | July 11 - July 24 | - Write unit tests for the new wrapper classes and the validation function - Fix the bugs and errors in the code (if any) |
Week 9-10 | July 25 - August 7 | Work on adding a description to each and every property that the HEK property to make it easier for the users to understand them while using. |
Week 11-12 | Aug 8 - Aug 14 | - Review the code added, and wait for reviews from mentors and community members and make the changes suggested. - Finish up a blog that would be maintained and constantly updated throughout the course of the program to document every change. |
My primary areas of interest include Web Development, mainly focusing on Backend development with proficiency in NodeJS, Django, and Ruby on Rails using both SQL and NoSQL databases. I also have some experience with ReactJS. I have been working with Git and GitHub for the last 3 years and have been well-versed in Version Control Systems and their nitty-gritty.
- Impresario: An application made using Django and PostgreSQL that helps users keep track of organisations, their sub-organisations, their members, and the events organised under them, making sure there are no clashes in the schedules of any people involved in any of the events. It uses Google API to integrate Google Calendars and send invites to all the attendees.
- Wordplay- A skribbl.io clone: An online game similar to Skribbl.io developed using NodeJS and MongoDB for the backend, while Vanilla JS for the front end. It uses Web Sockets to allow users to create their own rooms with the name of their choice and for others to join where each user in the room will get a canvas one by one, and they will be given a word they have to draw to make other users guess the word.
- Sunpy Issue #6457: Worked on the issue to clean up and cover all the GOES files in the tests. The PR for this issue has been merged, and the issue has been closed.
- Sunpy Issue #6239: Created a utility function that recursively traverses the tree in the XML header of a jp2 file and parsed the comments and the history associated with the files and each element to the relevant dictionary. The PR for this is in review for now.
- HNN-core Issue #544: This issue deals with adding a limit of the end of simulation time to all the plots so that they all can be aligned while plotting them as subplots in GUI. The PR is in the review stage as of now. ● Checkstyle Issue #7598: This issue deals with adding examples to the documentation on how the checks are made for Javadocs. This PR has been merged, and the issue has been closed.
- I see myself as a good team player with excellent communication skills. I have communicated with the Sunpy Team members and mentors of this project on the forum and on GitHub with the best of regards.
- I understand the importance of testing and documentation. I will add all the necessary tests and maintain a blog explaining each part of the code for contributors extending these features.
- I see this as the start of a long term association with Sunpy and Open Astronomy, and I’ll continue to contribute in the future.
- This is my first time applying for GSoC, and I am pretty excited about it. I promise to dedicate myself to this cause and work with all my heart to achieve the objectives of this project and contribute the cleanest and best code that I can achieve. I can easily dedicate around 4-5 hours a day to the GSoC program as I will have my summer vacation during the whole period.
- I have a lot of experience with NodeJs, Django, and Ruby on Rails. I am proficient in programming languages like Python, C, C++, Ruby, and Javascript. I have also contributed to Open Source by submitting patches to some communities on Github. Thus, it is comparatively easy for me to work on this codebase and analyse it properly.