New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Results from current season in engsocerdata
#16
Comments
I will try and implement this, it's a good idea and something that I frequently have to do. I have to make sure that the data are ok to take and include in a package. Also, the big issue is team names - my package may use slightly different versions of names for two or three teams. I have a dataframe of all possible version of teamnames for each team e.g. "Man Utd", "Manchester Utd", "Manchester United", "Man United", "Newton Heath", "Newton H", etc. That should allow us to ensure that the most recent season data can be used with historical data by team. |
Football-Data.co.uk claim on their website that data are "FREE" and that "You are free experiment with the data yourselves", but might be worth checking with them just in case. Hadn't thought about teamname variations so a separate dataframe is a good idea. Good luck with the implementation! |
I altered the function a bit - there was a typo and I've stripped it down slightly. I think the best thing is to have one function that brings all the data in from England and then puts it into
I will add as a function to the package and leave on GitHub. If I have time, I'd like to add this too for the other leagues. As a note - if you're interested in collating data / helping, I have other leagues and competitions going all the way back to their origins e.g. League Cup, French League, - just haven't had time to check + add to package yet. |
oops.... forgot about the teamnames fix. ugh - that will take time. I notice a lot of inconsistencies with my data, e.g. |
I had a bit of spare time this morning so wrote code to fetch data for the five other leagues available on Football-Data.co.uk. Their data only goes back to '94/'95 but better than nothing. I've left the 'division' variable as a factor for now rather than numeric, e.g. Scotland's divisions are defined as SC0, SC1, etc... I'm happy to help collate more data whenever I get a chance if you add them to the repo. Where did you obtain your data from by the way? It would be great to have European Cup fixtures too for completeness but can't find an archive of them anywhere.
|
@JoGall Thanks Joe. This is great. Will take a closer look at it - happy to add more leagues. I do have European Cup / Champion's League data - it's in the The data come from everywhere - all open source. I believe there are a bunch in the ReadMe. I did notice when I collated this a few years ago that a lot of the online websites with soccer data had copied each other and there were a few errors that they made. Only about 0.1% of the data, but annoying nonetheless. |
@JoGall Hi Joe - I had a look at importing the other leagues. On my first pass, the csvs imported by the Greek league would not all convert to tidy data using the convert function. Also, other leagues sometimes returned NAs in the csvs. I think these will work, but we'd have to check each file in turn before adding to the package. Also, adding a "Season" variable would be super useful for each csv- that would keep the data consistent with the other dataframes. |
Ok I've updated this and tested it properly now. There were some inconsistencies in the CSVs (e.g. some columns had 'HT' instead of 'HomeTeam') and annoyingly the division names for some leagues are zero indexed and some aren't, making it hard to parse 'tier' properly. The convert functions seems to work now for all the leagues; I've added a boolean parameter to help create a tier number from the division data (e.g. when 'zeroIndexed' =
|
Sorry for slow action on this - when adding I was running checks to ensure CRAN compatibility etc. which always take longer. All these data for these seasons have been added. Thanks for your help. I'd love to get more data going further back, but this is a great addition. I noted one error with the function - it assigns tier=2 for Belgium/Portugal/Turkey/Greece rather than tier =1. It works for Scotland to get the correct tier. I've corrected that. I like to release high quality proofed and checked data like I have for England, Germany, Spain etc. However, realistically I don't have time to do that level of checking for all leagues. Therefore, I've added these "as is" and hopefully if people find errors or additions they can file issues and/or pull-requests. Also, if these leagues had playoff games, those aren't included just yet. I've added that as a thing to do in the ReadMe. Also, all teamnames are added to the teamnames dataframe. Going forward if other Seasons are added to each league, we should pick a teamname for each team to stay with. Finally, I don't know if teams changed team names between 1994-2016 in each league. If they did, I've not noted that in the teamnames- I'm assuming unique teamnames in the data are unique teams not just those who changed name. I'll let others find that out and let me know if I need to fix. |
Do you plan to add a function to obtain up-to-date results for the current season at any point?
It's something I regularly need so made this script to fetch results from the current season from http://football-data.co.uk and change the formatting to use with
engsoccerdata
. The website has freely-available CSVs of historical results that are updated twice weekly (and also results from several other European leagues if you ever want to expand the data included with the package).Feel free to use / adapt this if you can find a way to implement it in the package.
Thanks for all your work!
The text was updated successfully, but these errors were encountered: