Project dataset export missing headers #8141

garethrees · 2024-02-27T10:03:32Z

Project::Export uses the first request in the project to generate the CSV headers. However, if the request hasn't had data extracted, the headers for the dataset columns will be blank, even though other requests in the project may have had a dataset extraction.

project = Project.find(28)
# => #<Project:0x00007fc4d2992e58

Project::Export::InfoRequest.new(project, project.info_requests.first).data
# =>
# => {:request_url=>"https://www.whatdotheyknow.com/request/[REDACTED]",
# =>  :request_title=>"[REDACTED]",
# =>  :public_body_name=>"Blaenau Gwent County Borough Council",
# =>  :request_owner=>"[REDACTED]",
# =>  :latest_status_contributor=>"[REDACTED]",
# =>  :status=>"partially_successful",
# =>  :dataset_contributor=>nil}

Project::Export::InfoRequest.new(project, project.info_requests.last).data
# =>
# => {:request_url=>"https://www.whatdotheyknow.com/request/[REDACTED]",
# =>  :request_title=>"[REDACTED]",
# =>  :public_body_name=>"Wyre Forest District Council",
# =>  :request_owner=>"[REDACTED]",
# =>  :latest_status_contributor=>"[REDACTED]",
# =>  :status=>"partially_successful",
# =>  :dataset_contributor=>"[REDACTED]",
# =>  "Was this [REDACTED]"=>"1",
# =>  "Were [REDACTED]"=>"0",
# =>  "What reasons [REDACTED]"=>"[REDACTED]",
# =>  "How many years [REDACTED]"=>"[REDACTED]",
# =>  "Total number of [REDACTED]"=>"[REDACTED]",
# =>  "Total number of [REDACTED]"=>"[REDACTED]",
# =>  "How many [REDACTED]"=>"[REDACTED]",
# =>  "Was the [REDACTED]"=>"[REDACTED]"}

The text was updated successfully, but these errors were encountered:

garethrees · 2024-02-27T10:10:47Z

As a quick hack we could pick the hash with the most keys as the header:

  def to_csv
    CSV.generate do |csv|
-     header = data.first
+     header = data.sort_by { |h| h.keys.size }.last
      csv << header.keys.map(&:to_s) if header
      data.each { |row| csv << row.values }
    end

Not exactly quick though:

e = Project::Export.new(project)

puts Benchmark.measure { e.data.sort_by { |h| h.keys.size }.last }
# =>  3.009553   0.175640   3.185193 (  4.889427)

garethrees · 2024-03-04T13:14:40Z

Ideally we'd get the keys from the dataset associated with the project. That's available to us given we instantiate Project::Export with a Project, but I don't like that the CSV headers would end up disconnected from the collected data – definitely easy for that to get out of sync in future if we – for example – allowed users to edit/reorder the datasets.

I think what we should really do is always use the dataset keys associated with the project to then go and get the associated values from each submission.

A better intermediate solution might be to change Project::Export::InfoRequest#extracted_values_as_hash so that it returns keys with empty values instead of a completely empty hash if there are no extracted values:

  def extracted_values_as_hash
    return {} unless extracted_values
    extracted_values.joins(:key).pluck('dataset_keys.title', :value).to_h
  end

Project::Export uses the first request in the project to generate the CSV headers. However, if the request hasn't had data extracted, the headers for the dataset columns will be blank, even though other requests in the project may have had a dataset extraction. This commit ensures that headers are always present by looking up the project's key set, and iterating through that to fetch the relevant submission value for that key, or otherwise assigning a nil value. This has the benefit of ensuring that the keys/values exported are always in sync with the current project key set. Fixes #8141.

garethrees added x:uk bug Breaks expected functionality f:projects labels Feb 27, 2024

garethrees mentioned this issue Mar 11, 2024

Fix missing headers when exporting Project data #8159

Merged

garethrees closed this as completed in 458abfa Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project dataset export missing headers #8141

Project dataset export missing headers #8141

garethrees commented Feb 27, 2024

garethrees commented Feb 27, 2024

garethrees commented Mar 4, 2024

Project dataset export missing headers #8141

Project dataset export missing headers #8141

Comments

garethrees commented Feb 27, 2024

garethrees commented Feb 27, 2024

garethrees commented Mar 4, 2024