Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better data type detection for pre_aggregated, indexed dataframes #61

Closed
dorisjlee opened this issue Aug 13, 2020 · 1 comment
Closed
Assignees
Labels
bug Something isn't working easy Easy to fix; Good issues for newcomers

Comments

@dorisjlee
Copy link
Member

dorisjlee commented Aug 13, 2020

When a dataframe is pre-aggregated, our type detection based on cardinality often fail to detect the type correctly. For example, when the dataset size is small (often the case when data is pre-aggregated), nominal fields would get recognized as a quantitative type.

df = pd.read_csv("lux/data/car.csv")
df["Year"] = pd.to_datetime(df["Year"], format='%Y') # change pandas dtype for the column "Year" to datetype
a = df.groupby("Cylinders").mean()

a.data_type

As a related issue, we should also support the detection of types for named index, for example, in this case, Cylinders is an index, so its data type is not being computed.

@dorisjlee dorisjlee added bug Something isn't working easy Easy to fix; Good issues for newcomers labels Aug 13, 2020
@dorisjlee dorisjlee changed the title Better data type detection for pre_aggregated dataframes Better data type detection for pre_aggregated, indexed dataframes Aug 13, 2020
@jinimukh jinimukh added this to the S1: January 2021 milestone Jan 15, 2021
@dorisjlee
Copy link
Member Author

Closing this after #287 is merged in. Great work Kunal!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working easy Easy to fix; Good issues for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants