New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collect feature stats #30
Comments
I'm thinking maybe an extra Optionally we can also let user opt-in a subset of transformers, but that's extra complexity. OTOH not sure if we can compute stats pre-transformation though, since it doesn't make sense for all inputs, e.g. strings, vector. |
FWIW here are a few things I've commonly checked in the past:
|
@yonromai questions:
|
|
So the pre-transform stats are potentially doable in the same @richwhitjr do you think it's worth doing the pre-transform stats in the same |
Seems complex and would be hard to mix monoids as needed. For example the transformation may want a QTree but for stats you will need a Moment Monoid. An adhoc "analysis" phase sounds promising though. I wonder though if this will require another type of Spec or if uses will expect the same type of stats for the same transformers. |
Could be nice to have it output the protobuf format that is required by Facets so that we get the feature visualizations for free. See: https://github.com/PAIR-code/facets/blob/master/facets_overview/proto/feature_statistics.proto |
@marcromeyn Facets seems to support a lot more things than we discussed here. Just checking if we can drop some to narrow the scope.
|
Seems it could be a lot of work to replicate all the logic in facets. I'm wondering if it's easier and better to just sample in featran and do the statistics summarization in facets? @marcromeyn @yonromai @richwhitjr thoughts? |
I like the idea of keeping the statistic summarization internal but make it easy for someone to take the stats and dump it ot something like Facets. In the future we could have a sub project to help do this in one step. I just worry about serialization and dependency issues we may run into when introducing a new library since Featran has to support a lot of different distributed systems, |
It turns out that Facets has a the ability to import |
It's easier to do this in TFDV, closing. |
Could be useful for debugging. A couple of thoughts
The text was updated successfully, but these errors were encountered: