New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: arguments imply differing number of rows #15

Closed
MayaGans opened this Issue Sep 15, 2018 · 9 comments

Comments

Projects
None yet
3 participants
@MayaGans

MayaGans commented Sep 15, 2018

Hi! I'm trying to load in Strava data and I get the following error when trying to run the process_data function:

Error in data.frame(lat = lat, lon = lon, ele = ele, time = time, id = id) : arguments imply differing number of rows: 1366, 0, 1

I'm not familiar with the structure of .gpx files so I was hoping you could shed some insight?

Thank you!

@marcusvolz

This comment has been minimized.

Show comment
Hide comment
@marcusvolz

marcusvolz Sep 15, 2018

Owner

Hi Maya, I fixed an issue with process_data yesterday, so try updating to the latest package. Then in your current working directory create a sub-directory called "data" and put all the gpx files in there, then execute process_data("data").

Owner

marcusvolz commented Sep 15, 2018

Hi Maya, I fixed an issue with process_data yesterday, so try updating to the latest package. Then in your current working directory create a sub-directory called "data" and put all the gpx files in there, then execute process_data("data").

@MayaGans

This comment has been minimized.

Show comment
Hide comment
@MayaGans

MayaGans Sep 16, 2018

Hi @marcusvolz thanks for getting back to me so quickly! I did as you said and now get the error Error in mutate_impl(.data, dots) : Column id must be length 0 (the number of rows) or one, not 3

Any ideas?

MayaGans commented Sep 16, 2018

Hi @marcusvolz thanks for getting back to me so quickly! I did as you said and now get the error Error in mutate_impl(.data, dots) : Column id must be length 0 (the number of rows) or one, not 3

Any ideas?

@marcusvolz

This comment has been minimized.

Show comment
Hide comment
@marcusvolz

marcusvolz Sep 16, 2018

Owner

It looks like it is trying to process three files but is returning empty data for each file. I'm not sure that I can be of much help without seeing the gpx files. I noticed Strava no longer allows bulk export of gpx files, so maybe your files are in a different format that process_data isn't set up for. The only other thing I can suggest is upgrading to the latest version of R if you haven't already.

Owner

marcusvolz commented Sep 16, 2018

It looks like it is trying to process three files but is returning empty data for each file. I'm not sure that I can be of much help without seeing the gpx files. I noticed Strava no longer allows bulk export of gpx files, so maybe your files are in a different format that process_data isn't set up for. The only other thing I can suggest is upgrading to the latest version of R if you haven't already.

@hugovk

This comment has been minimized.

Show comment
Hide comment
@hugovk

hugovk Sep 16, 2018

Contributor

I also saw the first error.

When you upload to Strava, they make "corrections" on the file, for example to try and fix rogue elevation data.

Pre-GPDR, you could bulk export all your activities as GPX files, I think these were the corrected ones.

These visualisation tools were fine with them, maybe Strava standardised them.

Post-GDPR, you can export an archive of your account which includes much more data, however your activities are now in their original file format. For me, that was GPX for GPX files I uploaded from another service, and from the Strava Android app, and FIT format from my Wahoo GPS tracker.

These tools were fine with the GPX files. I tried converting the FIT files to GPX using GPSBabel, and also FIT-to-GPX, but these tools gave Error in data.frame for them. In the end, I exported those activities from the Strava website (here's a script to download several).

I think Strava should probably still be offering the corrected tracks under GDPR.

I'll retest the with the latest version of this package.

Contributor

hugovk commented Sep 16, 2018

I also saw the first error.

When you upload to Strava, they make "corrections" on the file, for example to try and fix rogue elevation data.

Pre-GPDR, you could bulk export all your activities as GPX files, I think these were the corrected ones.

These visualisation tools were fine with them, maybe Strava standardised them.

Post-GDPR, you can export an archive of your account which includes much more data, however your activities are now in their original file format. For me, that was GPX for GPX files I uploaded from another service, and from the Strava Android app, and FIT format from my Wahoo GPS tracker.

These tools were fine with the GPX files. I tried converting the FIT files to GPX using GPSBabel, and also FIT-to-GPX, but these tools gave Error in data.frame for them. In the end, I exported those activities from the Strava website (here's a script to download several).

I think Strava should probably still be offering the corrected tracks under GDPR.

I'll retest the with the latest version of this package.

@hugovk

This comment has been minimized.

Show comment
Hide comment
@hugovk

hugovk Sep 16, 2018

Contributor

macOS High Sierra 10.13
R 3.5.1
Latest version of this package: b98010a
devtools 1.13.6
mapproj 1.2.6
tidyverse 1.2.1

Here's a test using a FIT file converted to GPX using:

  • GPSBabel
  • FIT-to-GPX
  • Strava export

testing.zip contains the input FIT file, the three GPX files, and the output plots.

$ R

R version 3.5.1 (2018-07-02) -- "Feather Spray"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin17.6.0 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(strava)
> library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──
✔ ggplot2 3.0.0purrr   0.2.5tibble  1.4.2dplyr   0.7.6tidyr   0.8.1stringr 1.3.1readr   1.1.1forcats 0.3.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

> data <- process_data("fit2gpx")
Error in data.frame(lat = lat, lon = lon, ele = ele, time = time, type = type) :
  arguments imply differing number of rows: 10754, 10715, 1

> data <- process_data("gpsbabel")
Error in data.frame(lat = lat, lon = lon, ele = ele, time = time, type = type) :
  arguments imply differing number of rows: 10754, 10715, 1

> data <- process_data("strava-export")
> p1 <- plot_facets(data)
> ggsave("plots/facets-strava-export.png", p1, width = 20, height = 20, units = "cm")
> p2 <- plot_map(data, lon_min = 22.65, lon_max = 26.65, lat_min = 59.61, lat_max = 60.84) # Uusimaa
> ggsave("plots/map-strava-export.png", p2, width = 20, height = 15, units = "cm", dpi = 600)
> p3 <- plot_elevations(data)
> ggsave("plots/elevations-strava-export.png", p3, width = 20, height = 20, units = "cm")
>

Both the problem files have 10754 <trkpt>s (the good one has 10787).

Contributor

hugovk commented Sep 16, 2018

macOS High Sierra 10.13
R 3.5.1
Latest version of this package: b98010a
devtools 1.13.6
mapproj 1.2.6
tidyverse 1.2.1

Here's a test using a FIT file converted to GPX using:

  • GPSBabel
  • FIT-to-GPX
  • Strava export

testing.zip contains the input FIT file, the three GPX files, and the output plots.

$ R

R version 3.5.1 (2018-07-02) -- "Feather Spray"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin17.6.0 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(strava)
> library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──
✔ ggplot2 3.0.0purrr   0.2.5tibble  1.4.2dplyr   0.7.6tidyr   0.8.1stringr 1.3.1readr   1.1.1forcats 0.3.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

> data <- process_data("fit2gpx")
Error in data.frame(lat = lat, lon = lon, ele = ele, time = time, type = type) :
  arguments imply differing number of rows: 10754, 10715, 1

> data <- process_data("gpsbabel")
Error in data.frame(lat = lat, lon = lon, ele = ele, time = time, type = type) :
  arguments imply differing number of rows: 10754, 10715, 1

> data <- process_data("strava-export")
> p1 <- plot_facets(data)
> ggsave("plots/facets-strava-export.png", p1, width = 20, height = 20, units = "cm")
> p2 <- plot_map(data, lon_min = 22.65, lon_max = 26.65, lat_min = 59.61, lat_max = 60.84) # Uusimaa
> ggsave("plots/map-strava-export.png", p2, width = 20, height = 15, units = "cm", dpi = 600)
> p3 <- plot_elevations(data)
> ggsave("plots/elevations-strava-export.png", p3, width = 20, height = 20, units = "cm")
>

Both the problem files have 10754 <trkpt>s (the good one has 10787).

@MayaGans

This comment has been minimized.

Show comment
Hide comment
@MayaGans

MayaGans Sep 16, 2018

Thanks for the tips FTR you can still bulk download GPX files under Settings > My Account > Get Started > Download Request but it's unfortunately still not working. I ran @hugovk's code with the test file and that did work... :/

I've attached one of my GPX files (zipped, apparently github doesn't like .gpx), I have ZERO familiarity with their structure, would you mind taking a peek to see if they're in the correct format? The extension is .gpx but maybe I need to convert them?

1969459350.gpx.gz

Thanks again!!

MayaGans commented Sep 16, 2018

Thanks for the tips FTR you can still bulk download GPX files under Settings > My Account > Get Started > Download Request but it's unfortunately still not working. I ran @hugovk's code with the test file and that did work... :/

I've attached one of my GPX files (zipped, apparently github doesn't like .gpx), I have ZERO familiarity with their structure, would you mind taking a peek to see if they're in the correct format? The extension is .gpx but maybe I need to convert them?

1969459350.gpx.gz

Thanks again!!

@hugovk

This comment has been minimized.

Show comment
Hide comment
@hugovk

hugovk Sep 16, 2018

Contributor

Settings > My Account > Get Started > Download Request is the same bulk export I mentioned, but it took quite a few hours for the email to arrive for me. I had assumed it wasn't working!

I also get an error with that attached file:

> data <- process_data("/tmp/Downloads/1969459350")
Error in data.frame(lat = lat, lon = lon, ele = ele, time = time, type = type) :
  arguments imply differing number of rows: 1366, 0, 1

It is indeed in GPX format, which is a type of XML file.


Here's an idea.

My working exported file is made up of points like this:

			<trkpt lat="60.1756950" lon="24.9218050">
				<ele>18.8</ele>
				<time>2018-06-01T13:19:48Z</time>
				<extensions>
					<gpxtpx:TrackPointExtension></gpxtpx:TrackPointExtension>
				</extensions>
			</trkpt>

Not working 1969459350.gpx is like this:

			<trkpt lat="40.6627840" lon="-73.9653840">
				<time>2018-07-25T23:11:06+00:00</time>
				<extensions>
					<gpxtpx:TrackPointExtension>
						<gpxtpx:cad>0</gpxtpx:cad>
					</gpxtpx:TrackPointExtension>
				</extensions>
			</trkpt>

Not working 1730574024.fit.fit2gpx.gpx:

			<trkpt lat="60.175695056" lon="24.921805004">
				<time>2018-06-01T13:20:21Z</time>
				<speed>0.000000</speed>
			</trkpt>

Not working 1730574024.fit.gpsbabel.gpx:

			<trkpt lat="60.175695056" lon="24.921805004">
				<time>2018-06-01T13:20:21Z</time>
				<speed>0.000000</speed>
  • The working one has lat, lon, ele (elevation) and time.
  • The non-working ones only have lat, lon and time.

strava/R/process_data.R

Lines 20 to 32 in b98010a

type <- str_match(file, ".*-(.*).gpx")[[2]]
# Check for empty file.
if (length(coords) == 0) return(NULL)
# dist_to_prev computation requires that there be at least two coordinates.
if (ncol(coords) < 2) return(NULL)
lat <- as.numeric(coords["lat", ])
lon <- as.numeric(coords["lon", ])
ele <- as.numeric(XML::xpathSApply(pfile, path = "//trkpt/ele", XML::xmlValue))
time <- XML::xpathSApply(pfile, path = "//trkpt/time", XML::xmlValue)
# Put everything in a data frame
result <- data.frame(lat = lat, lon = lon, ele = ele, time = time, type = type) %>%

Elevation is obviously needed for plot_elevations, but perhaps it can be set to None/null/empty when it's not available (and skipped in plot_elevations), so the other plot functions can still work.


There may also be a problem with getting type, which assumes the files are named like 20180912-064451-Ride.gpx, as they were pre-GDPR, but they're now like 1836025202.gpx from Strava (or whatever from other sources).

type can be used for colour coding plots (like in #13). I don't think anything will fail if this value isn't present. If it's not found in the filename, it can be sometimes found in the top of a file:

1730574024.strava-export.gpx:

		<name>Afternoon Ride</name>
		<type>1</type>

1969459350.gpx:

		<name>TODO</name>
		<type>Run</type>
Contributor

hugovk commented Sep 16, 2018

Settings > My Account > Get Started > Download Request is the same bulk export I mentioned, but it took quite a few hours for the email to arrive for me. I had assumed it wasn't working!

I also get an error with that attached file:

> data <- process_data("/tmp/Downloads/1969459350")
Error in data.frame(lat = lat, lon = lon, ele = ele, time = time, type = type) :
  arguments imply differing number of rows: 1366, 0, 1

It is indeed in GPX format, which is a type of XML file.


Here's an idea.

My working exported file is made up of points like this:

			<trkpt lat="60.1756950" lon="24.9218050">
				<ele>18.8</ele>
				<time>2018-06-01T13:19:48Z</time>
				<extensions>
					<gpxtpx:TrackPointExtension></gpxtpx:TrackPointExtension>
				</extensions>
			</trkpt>

Not working 1969459350.gpx is like this:

			<trkpt lat="40.6627840" lon="-73.9653840">
				<time>2018-07-25T23:11:06+00:00</time>
				<extensions>
					<gpxtpx:TrackPointExtension>
						<gpxtpx:cad>0</gpxtpx:cad>
					</gpxtpx:TrackPointExtension>
				</extensions>
			</trkpt>

Not working 1730574024.fit.fit2gpx.gpx:

			<trkpt lat="60.175695056" lon="24.921805004">
				<time>2018-06-01T13:20:21Z</time>
				<speed>0.000000</speed>
			</trkpt>

Not working 1730574024.fit.gpsbabel.gpx:

			<trkpt lat="60.175695056" lon="24.921805004">
				<time>2018-06-01T13:20:21Z</time>
				<speed>0.000000</speed>
  • The working one has lat, lon, ele (elevation) and time.
  • The non-working ones only have lat, lon and time.

strava/R/process_data.R

Lines 20 to 32 in b98010a

type <- str_match(file, ".*-(.*).gpx")[[2]]
# Check for empty file.
if (length(coords) == 0) return(NULL)
# dist_to_prev computation requires that there be at least two coordinates.
if (ncol(coords) < 2) return(NULL)
lat <- as.numeric(coords["lat", ])
lon <- as.numeric(coords["lon", ])
ele <- as.numeric(XML::xpathSApply(pfile, path = "//trkpt/ele", XML::xmlValue))
time <- XML::xpathSApply(pfile, path = "//trkpt/time", XML::xmlValue)
# Put everything in a data frame
result <- data.frame(lat = lat, lon = lon, ele = ele, time = time, type = type) %>%

Elevation is obviously needed for plot_elevations, but perhaps it can be set to None/null/empty when it's not available (and skipped in plot_elevations), so the other plot functions can still work.


There may also be a problem with getting type, which assumes the files are named like 20180912-064451-Ride.gpx, as they were pre-GDPR, but they're now like 1836025202.gpx from Strava (or whatever from other sources).

type can be used for colour coding plots (like in #13). I don't think anything will fail if this value isn't present. If it's not found in the filename, it can be sometimes found in the top of a file:

1730574024.strava-export.gpx:

		<name>Afternoon Ride</name>
		<type>1</type>

1969459350.gpx:

		<name>TODO</name>
		<type>Run</type>
@MayaGans

This comment has been minimized.

Show comment
Hide comment
@MayaGans

MayaGans Sep 16, 2018

I really just wanted to generate the small multiple plots, by taking out the ele = ele argument within the plot dist function I was able to do so -- THANK YOU!

MayaGans commented Sep 16, 2018

I really just wanted to generate the small multiple plots, by taking out the ele = ele argument within the plot dist function I was able to do so -- THANK YOU!

@MayaGans MayaGans closed this Sep 16, 2018

@hugovk

This comment has been minimized.

Show comment
Hide comment
@hugovk

hugovk Sep 17, 2018

Contributor

Correcting myself, when I said:

  • The working one has lat, lon, ele (elevation) and time.
  • The non-working ones only have lat, lon and time.

My two converted files do have some ele values, but not every track point has one. There are 10754 points (with lat, lon, time), and only 10715 of those have ele values. Maya's file has 1366 points with 0 ele values. We can see these pairs of numbers in the error messages:

Error in data.frame(lat = lat, lon = lon, ele = ele, time = time, type = type) :
  arguments imply differing number of rows: 10754, 10715, 1

Strava makes altitude corrections, that'll be why their exports all have points with elevation values but some original files do not.

Contributor

hugovk commented Sep 17, 2018

Correcting myself, when I said:

  • The working one has lat, lon, ele (elevation) and time.
  • The non-working ones only have lat, lon and time.

My two converted files do have some ele values, but not every track point has one. There are 10754 points (with lat, lon, time), and only 10715 of those have ele values. Maya's file has 1366 points with 0 ele values. We can see these pairs of numbers in the error messages:

Error in data.frame(lat = lat, lon = lon, ele = ele, time = time, type = type) :
  arguments imply differing number of rows: 10754, 10715, 1

Strava makes altitude corrections, that'll be why their exports all have points with elevation values but some original files do not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment