You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The following list contains source-code related inconsistencies, which do not have any user-visible effects:
Check for duplicated rows when reading data in util-read.R (we currently do that for commit data, but not for issue/mail/bot/author/commit-messages/gender/pasta/... data. If we detect duplicates, we should print a warning and remove the duplicate entries.
Tidy-up the code sections in util-read.R: Currently, there are two sections containing "Helper functions". These sections should be merged to a single section.
Reconsider checking the caller of functions when printing warnings regarding the cleanup.[...].data functions at the end of the update.[...].data functions for additional data sources in util-data.R: At the moment, the use of these warnings is different for different data sources. The goal is to prevent the warning from being printed if the update function is called from the cleanup function. However, this might be different for different data sources. Therefore, we should check whether this is consistent among all of them and also whether this is useful for all of them.
Reorder the fields (i.e., attributes) of the data class: Field commit.messages seems to be out of order in the list of fields. Also check that for several other list, to order the items either alphabetically or semantically, to ease maintenance via consistent orderings.
Think about removing rownames when reading data.
Decide on a spelling variant of the word "data frame" in code comments / documentation. Currently, we use a variety of different spelling variants (e.g., data frame, data-frame, dataframe, dataFrame, Dataframe, DataFrame, etc.). If we agree on a single spelling variant, we also should add this to the contribution guide.
Reflect about whether defining an ID constant would be helpful to map a data-source name to its corresponding ID column. E.g., the id column for mails is the message id, for issue events the event id (not the issue id!), for commits the commit hash (or the commit id? Both would be possible...). Having such an ID constant would be useful to faster determine which column of a data source is the id column and to access the id column without knowing about the data source.
Reflect about the sample data: Currently, the sample data only contains commit and mail data. This is enough to demonstrate how reading data and how building networks works. However, we could think about whether it may be useful to also add several other data sources to the sample data.
The text was updated successfully, but these errors were encountered:
The following list contains source-code related inconsistencies, which do not have any user-visible effects:
util-read.R
(we currently do that for commit data, but not for issue/mail/bot/author/commit-messages/gender/pasta/... data. If we detect duplicates, we should print a warning and remove the duplicate entries.util-read.R
: Currently, there are two sections containing "Helper functions". These sections should be merged to a single section.cleanup.[...].data
functions at the end of theupdate.[...].data
functions for additional data sources inutil-data.R
: At the moment, the use of these warnings is different for different data sources. The goal is to prevent the warning from being printed if theupdate
function is called from thecleanup
function. However, this might be different for different data sources. Therefore, we should check whether this is consistent among all of them and also whether this is useful for all of them.commit.messages
seems to be out of order in the list of fields. Also check that for several other list, to order the items either alphabetically or semantically, to ease maintenance via consistent orderings.data frame
,data-frame
,dataframe
,dataFrame
,Dataframe
,DataFrame
, etc.). If we agree on a single spelling variant, we also should add this to the contribution guide.ID
constant would be helpful to map a data-source name to its corresponding ID column. E.g., the id column for mails is the message id, for issue events the event id (not the issue id!), for commits the commit hash (or the commit id? Both would be possible...). Having such anID
constant would be useful to faster determine which column of a data source is the id column and to access the id column without knowing about the data source.sample
data only contains commit and mail data. This is enough to demonstrate how reading data and how building networks works. However, we could think about whether it may be useful to also add several other data sources to thesample
data.The text was updated successfully, but these errors were encountered: