-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a team names dataframe to standardize variations #17
Comments
Here's what the csv for As you can see it has two columns. The first is the team name that appears in I don't think I started a similar file for the other leagues. A |
In terms of goals - I think it's definitely useful to have an index of all team names and their variants. This will help a lot in future data collection. |
Could you perhaps automate adding new variants to the team name index and making it available on the repo? Then anyone can pitch in and manually annotate them. Might be useful if/when you add other European leagues whose teams you know less about! |
Hi @JoGall - can you expand on what you mean by automate adding new variants? Not sure how to implement a way of editing it by other users (maybe I'm just missing something obvious). I've merged the pull request of @aqsmith08 which has created a new teamname df. I will add England names to this - I might modify it a bit later today. |
Hey @JoGall - I'd love to hear more too. Are you looking for something like --
We create a function that would take all the Also, I should add that I chose not to change any team names that @jalapic already had in the CSV files (e.g. |
@aqsmith08 @JoGall - ok, in the new GitHub version of the package there is a .rda file in ./data (and a corresponding csv file in ./data-raw) called
The
If you have any ideas on improving this, let me know. It definitely helps me in cleaning up the data. |
Yes @aqsmith08 that's exactly what I was thinking. Maybe a snippet of code that could run when importing new leagues that automatically adds unknown names to
which returns:
Of course we'd have to review them manually from this point but at least we'd know what teams need doing. @jalapic you could update the 'teamnames' dataframe each time you add new data and add it to the repo (perhaps as a CSV) so anyone can pitch in and help? |
Made this function to add unindexed team names to the
|
TODO: another function to replace alternative team names with their preferred name. e.g. if 'home' =
|
Just wanted to check-in and see how we're doing here. It seems like we've successfully completed the focus of this issue -- Create a team names dataframe to standardize variations. I double checked all the leagues we have in teamnames.csv and it seems like every country league listed in the data-raw folder is covered. Is that correct or am I missing anything? I do think there's other things we can do (e.g. adding new functions, update the
|
@aqsmith08 @JoGall Agreed - thanks both. I'll close this issue. We could open a new issue to check I have added the function for checking if a teamname is unique and adding to teamname dataframe to |
Hi @jalapic ,
I wanted to start an issue to track the team names dataframe work since it has come up a couple times. For example, you have it listed in the README:
and it's also been mentioned in your discussion with @JoGall in issue #16.
Before any work is done, it'd be great to clarify what your ideal outcome is. For example, are you looking for a dataframe like this --
Yes, I apologize for using NFL teams here but it was the quickest example I could think of.
By having these three columns, folks will only have to check a subset of team names each year (e.g. only where most_recent == true) and can create a new row if something has changed.
What do you think?
The text was updated successfully, but these errors were encountered: