Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Working with specific kind of data #47

Closed
pachevalier opened this issue Feb 16, 2016 · 1 comment
Closed

Working with specific kind of data #47

pachevalier opened this issue Feb 16, 2016 · 1 comment

Comments

@pachevalier
Copy link

I like dplyr and tidyr. Those packages are really useful to work with conventional tabular data sets. However, we should add recommandations to work with specific data sets such as geographical data (polygons of cities, countries, etc) or network data.

For geographical data, SpatialPolygonsDataFrame aren't easy to manipulate. For instance, it's not very handy to filter a SpatialPolygonDataFrame. Converting SpatialPolygonDataFrame to data frames (what we do with fortify to draw polygons using ggplot2) isn't the best solution for memory usage. SO we might be able to find something else and have good recommandations for data-scientists.

I recently had to work with network data. It was also very difficult to find the good structure for my data. Imagine I have a dataset with in the first column the set of each node and in the second column a list of groups the node belongs to. I want to have a data set with one line for each relationship between two nodes (I assume that if node A and node B belong to group 1, they have 1 relation). Standard tools such as tidyr are not really done for that kind of usage.

I think that those kind of data are very often used by data-scientists and this book should also address those issues.

@hadley
Copy link
Owner

hadley commented Feb 16, 2016

That is unfortunately outside the scope of this book. See the intro.

@hadley hadley closed this as completed Feb 16, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants