Working with specific kind of data #47

pachevalier · 2016-02-16T16:02:16Z

I like dplyr and tidyr. Those packages are really useful to work with conventional tabular data sets. However, we should add recommandations to work with specific data sets such as geographical data (polygons of cities, countries, etc) or network data.

For geographical data, SpatialPolygonsDataFrame aren't easy to manipulate. For instance, it's not very handy to filter a SpatialPolygonDataFrame. Converting SpatialPolygonDataFrame to data frames (what we do with fortify to draw polygons using ggplot2) isn't the best solution for memory usage. SO we might be able to find something else and have good recommandations for data-scientists.

I recently had to work with network data. It was also very difficult to find the good structure for my data. Imagine I have a dataset with in the first column the set of each node and in the second column a list of groups the node belongs to. I want to have a data set with one line for each relationship between two nodes (I assume that if node A and node B belong to group 1, they have 1 relation). Standard tools such as tidyr are not really done for that kind of usage.

I think that those kind of data are very often used by data-scientists and this book should also address those issues.

The text was updated successfully, but these errors were encountered:

hadley · 2016-02-16T20:02:43Z

That is unfortunately outside the scope of this book. See the intro.

hadley closed this as completed Feb 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Working with specific kind of data #47

Working with specific kind of data #47

pachevalier commented Feb 16, 2016

hadley commented Feb 16, 2016

Working with specific kind of data #47

Working with specific kind of data #47

Comments

pachevalier commented Feb 16, 2016

hadley commented Feb 16, 2016