You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I like dplyr and tidyr. Those packages are really useful to work with conventional tabular data sets. However, we should add recommandations to work with specific data sets such as geographical data (polygons of cities, countries, etc) or network data.
For geographical data, SpatialPolygonsDataFrame aren't easy to manipulate. For instance, it's not very handy to filter a SpatialPolygonDataFrame. Converting SpatialPolygonDataFrame to data frames (what we do with fortify to draw polygons using ggplot2) isn't the best solution for memory usage. SO we might be able to find something else and have good recommandations for data-scientists.
I recently had to work with network data. It was also very difficult to find the good structure for my data. Imagine I have a dataset with in the first column the set of each node and in the second column a list of groups the node belongs to. I want to have a data set with one line for each relationship between two nodes (I assume that if node A and node B belong to group 1, they have 1 relation). Standard tools such as tidyr are not really done for that kind of usage.
I think that those kind of data are very often used by data-scientists and this book should also address those issues.
The text was updated successfully, but these errors were encountered:
I like
dplyr
andtidyr
. Those packages are really useful to work with conventional tabular data sets. However, we should add recommandations to work with specific data sets such as geographical data (polygons of cities, countries, etc) or network data.For geographical data, SpatialPolygonsDataFrame aren't easy to manipulate. For instance, it's not very handy to filter a SpatialPolygonDataFrame. Converting SpatialPolygonDataFrame to data frames (what we do with
fortify
to draw polygons usingggplot2
) isn't the best solution for memory usage. SO we might be able to find something else and have good recommandations for data-scientists.I recently had to work with network data. It was also very difficult to find the good structure for my data. Imagine I have a dataset with in the first column the set of each node and in the second column a list of groups the node belongs to. I want to have a data set with one line for each relationship between two nodes (I assume that if node A and node B belong to group 1, they have 1 relation). Standard tools such as
tidyr
are not really done for that kind of usage.I think that those kind of data are very often used by data-scientists and this book should also address those issues.
The text was updated successfully, but these errors were encountered: