Converts statcast over to using the try_parse code that autodetermines data types #106

TheCleric · 2020-09-05T03:52:14Z

This PR was done in three steps:

Create unit tests for existing statcast code
Migrate statcast to using the try_parse postprocessor to autodetermine data types
Ensure unit tests run successfully

When merged this PR should fix the following issues: #105, #104, #59

Fixed very stupid typo in README

Merge master in

Team Batting (FanGraphs) => Add unit tests; use new consolidated data…

team_pitching to new FanGraphs datasource (jldbc#96)

schorrm · 2020-09-06T09:15:46Z

pybaseball/datahelpers/postprocessing.py

+    '%Y-%m-%dT%H:%M:%S.%fZ', # Just in case (https://github.com/jldbc/pybaseball/issues/104)
+]
+
+def try_parse_dataframe(


Can we make this fit on one line? Maybe by setting something for Unon[str, int, float', datetime']?

Define ColType = Union[str, int, float', datetime'] somewhere, and annotate ColType?

Do you find it less readable on multiple lines? I'll put it on one line if desired, but I find it more readable this way. You near instantly can see how many parameters it takes, of what type, and which have defaults. Whereas one one line they tend to somewhat run together in my opinion.

Thoughts?

Also digging through the PEP to see what are the preferred way to handle this and found this discussion:

python/typing#433

Where they suggested an alternative that would look like this if you'd prefer (though I personally find it less readable):

def try_parse_dataframe(data: pd.DataFrame, null_replacement: Union[str, int, float, datetime] = np.nan, known_percentages: List[str] = []) -> pd.DataFrame:

schorrm · 2020-09-08T10:22:42Z

LGTM

…s data types (#106) * Add unit tests for statcast to ensure backwards compatibility * Convert statcast to using try_parse * Add some missing testing

TheCleric added 11 commits August 31, 2020 12:00

Merge pull request #2 from jldbc/master

8c2c444

Fixed very stupid typo in README

Merge pull request #3 from jldbc/master

2cf2640

Merge master in

Merge pull request #4 from jldbc/master

76819b7

Team Batting (FanGraphs) => Add unit tests; use new consolidated data…

Merge pull request #5 from jldbc/master

d94cbdf

team_pitching to new FanGraphs datasource (jldbc#96)

Add unit tests for statcast to ensure backwards compatibility

8d2d014

Convert statcast to using try_parse

9e958f5

Fix bad type declaration

1fdeadd

Add some missing testing

8fd925a

Forgot to remove print statements

b04526c

Add test data to .gitattribute to exclude from stats

db01d6e

Remove inadvertant file add

a4d3c5a

schorrm reviewed Sep 6, 2020

View reviewed changes

schorrm merged commit f45c4ca into jldbc:master Sep 8, 2020

This was referenced Sep 8, 2020

Add date support to prostproccess.try_parse #105

Closed

Set correct data types in Statcast data #59

Closed

TheCleric deleted the feature/statcast_use_try_parse branch September 11, 2020 14:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converts statcast over to using the try_parse code that autodetermines data types #106

Converts statcast over to using the try_parse code that autodetermines data types #106

TheCleric commented Sep 5, 2020

schorrm Sep 6, 2020

schorrm Sep 6, 2020

TheCleric Sep 6, 2020

TheCleric Sep 6, 2020

schorrm commented Sep 8, 2020

Converts statcast over to using the try_parse code that autodetermines data types #106

Converts statcast over to using the try_parse code that autodetermines data types #106

Conversation

TheCleric commented Sep 5, 2020

schorrm Sep 6, 2020

Choose a reason for hiding this comment

schorrm Sep 6, 2020

Choose a reason for hiding this comment

TheCleric Sep 6, 2020

Choose a reason for hiding this comment

TheCleric Sep 6, 2020

Choose a reason for hiding this comment

schorrm commented Sep 8, 2020