Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Horse Finishing Times #6

Closed
puntermick opened this issue Feb 8, 2019 · 5 comments
Closed

Horse Finishing Times #6

puntermick opened this issue Feb 8, 2019 · 5 comments

Comments

@puntermick
Copy link

This is not an issue at all just added comment.

I noted the interesting idea you included about calculating
each horses finishing time based on the available winner time
and a calculation that transforms lengths beaten into
seconds etc.

This could be the basis of something useful.
The trick for the punter may be when to ponder using it and when to ignore it.

Picture a long distance chase for example.
The last dregs of finishers won't be putting maximum effort into
finishing as fast as they can. Any time recorded
might be a dubious measure of the ability they may demonstrate in future.

Less dubious in the same race may be the times for the first three home.

Style of race may have impact as well.
A 5f flat sprint for example would have a lot less of
the "I will just plod along at the end with minimal effort" style of impact.

Interesting that you have bother to include such stuff. :)
Future research into the data may reveal when it is a decent metric to use and when it is not.

AND

I can envision two possible routes for this scraper stuff.

#1 - rpsraper.py you continue to add new stuff too such as the above.

#2 - rpscraper.py is more so focussed on pure scrape. There is only so much data on the page and once it grabs it all without fail it is deemed perfect and set in stone.

Datatransform.py is then a 2nd script that takes the raw scrape output and creates extra fields.
Decimal odds, weights in lbs, time calcs anything else custom calculation wise.

No right or wrong answers I guess.

@joenano
Copy link
Owner

joenano commented Feb 8, 2019

Your 100% right, the times will be meaningless in some cases with horses eased down etc. Final times alone dont really mean much in European racing without the sectionals and even less so over jumps.

Im a flat racing man so the jumps probably didnt enter my mind when I added the time calculation! I think you just have to use your judgement as to when they are useful and trustworthy on an individual basis. Not sure that I can do anything other than just record them.

Appreciate the feedback and ideas.

@puntermick
Copy link
Author

ta for the other thread character encoding fix.
I will re download and have another whirl.

And yup re "just record them"
All you can do is make a good spade.
Up to the spade user as to whether he uses said spade for his career in ditch digging
or if he uses it thoughtlessly to dig his own grave :)

Sectionals..a growing area in the uk.
Silly scenario of some courses have them some do not.
Really needs racing authorities to say
"By Date X all courses must provide them in agreed standard format"
and ideally
"They must be fed into a central authority database from where they will be distributed freely to all punters with open source ethos."

We live in hope if not expectation :)

Other Ideas

Ponder some future means to download by date range.
A data scientist may be content with a one off trawl for data upon which to do some study.
A punter will be more prone to seek continual daily updates.

A download by date range feature could perhaps form the backbone
of their update routine.
Some form of scheduled task could perhaps call a download by date range call on a regular basis.
( if last update date was stored somehow that could be useful. That may facilitate and update script that needed no date parameters passed to it and that I suspect may make automated scheduling easier)
It would reduce the inefficiency of having to scrape a whole year when only 1 day or one week etc was required. Faster for the user and more respectful to the source.

Perhaps even useful if future scrape debugging was needed for a weird page.
Date range would let you more closely target the troublesome spot etc.

Just a few brain storm ideas.
Sure you have more ideas than you do have spare time :)

@puntermick
Copy link
Author

PS line 146

int(year) < 2019

To permit 2019 data download is it is as simple as changing that to 2020 or does anything else need done?

@joenano
Copy link
Owner

joenano commented Feb 9, 2019

Yeah we are lagging behind the rest of the world when it comes to timing, probably by design, the bookies have more influence here than elsewhere.

Scraping by date range would require a different method, both in getting the individual race urls, and in storing the data. The current method gets every race url from a given year at a given track with one request. To scrape by data range, every date would need to be scraped individually, which is easy enough but I opted for the more efficient method when making the tool as it was more about historical data.

I will get the scrape by date working as a separate script and see where it goes as I think its a good idea. Im currently working on a different project and I havent thought about this in a while but you have given me some motivation for it.

And yes, just increasing that will increase the valid year range. I have updated that to 2020.

@puntermick
Copy link
Author

bookies with more influence..yup.
I thought it a sad day when racing authorities opted to go for a % of bookie profit share instead of a % of turnover as per Ireland.
It shifted uk Racing bodies to very much on the bookies side and against punters.
Turnover % would have positioned them as more neutral. That would be better from a longer term PR perspective. As it is they are now condoning accomplices to poor bookie behavior to racing punters.

They may face a degree of suffering as well from shifty bookie accountancy tricks which are easier to do on net profit than turnover. Racing as a loss leader etc

Water under the bridge that one though.

As for your motivation levels..

I suspect they will rise as your favored flat season approaches :)

What you have got so far here is brilliant as is.

@joenano joenano closed this as completed Feb 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants