Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project dataset export missing headers #8141

Closed
garethrees opened this issue Feb 27, 2024 · 2 comments · Fixed by #8159
Closed

Project dataset export missing headers #8141

garethrees opened this issue Feb 27, 2024 · 2 comments · Fixed by #8159
Labels
bug Breaks expected functionality f:projects x:uk

Comments

@garethrees
Copy link
Member

See user query.

Project::Export uses the first request in the project to generate the CSV headers. However, if the request hasn't had data extracted, the headers for the dataset columns will be blank, even though other requests in the project may have had a dataset extraction.

project = Project.find(28)
# => #<Project:0x00007fc4d2992e58

Project::Export::InfoRequest.new(project, project.info_requests.first).data
# =>
# => {:request_url=>"https://www.whatdotheyknow.com/request/[REDACTED]",
# =>  :request_title=>"[REDACTED]",
# =>  :public_body_name=>"Blaenau Gwent County Borough Council",
# =>  :request_owner=>"[REDACTED]",
# =>  :latest_status_contributor=>"[REDACTED]",
# =>  :status=>"partially_successful",
# =>  :dataset_contributor=>nil}

Project::Export::InfoRequest.new(project, project.info_requests.last).data
# =>
# => {:request_url=>"https://www.whatdotheyknow.com/request/[REDACTED]",
# =>  :request_title=>"[REDACTED]",
# =>  :public_body_name=>"Wyre Forest District Council",
# =>  :request_owner=>"[REDACTED]",
# =>  :latest_status_contributor=>"[REDACTED]",
# =>  :status=>"partially_successful",
# =>  :dataset_contributor=>"[REDACTED]",
# =>  "Was this [REDACTED]"=>"1",
# =>  "Were [REDACTED]"=>"0",
# =>  "What reasons [REDACTED]"=>"[REDACTED]",
# =>  "How many years [REDACTED]"=>"[REDACTED]",
# =>  "Total number of [REDACTED]"=>"[REDACTED]",
# =>  "Total number of [REDACTED]"=>"[REDACTED]",
# =>  "How many [REDACTED]"=>"[REDACTED]",
# =>  "Was the [REDACTED]"=>"[REDACTED]"}
@garethrees garethrees added x:uk bug Breaks expected functionality f:projects labels Feb 27, 2024
@garethrees
Copy link
Member Author

As a quick hack we could pick the hash with the most keys as the header:

  def to_csv
    CSV.generate do |csv|
-     header = data.first
+     header = data.sort_by { |h| h.keys.size }.last
      csv << header.keys.map(&:to_s) if header
      data.each { |row| csv << row.values }
    end

Not exactly quick though:

e = Project::Export.new(project)

puts Benchmark.measure { e.data.sort_by { |h| h.keys.size }.last }
# =>  3.009553   0.175640   3.185193 (  4.889427)

@garethrees
Copy link
Member Author

Ideally we'd get the keys from the dataset associated with the project. That's available to us given we instantiate Project::Export with a Project, but I don't like that the CSV headers would end up disconnected from the collected data – definitely easy for that to get out of sync in future if we – for example – allowed users to edit/reorder the datasets.

I think what we should really do is always use the dataset keys associated with the project to then go and get the associated values from each submission.

A better intermediate solution might be to change Project::Export::InfoRequest#extracted_values_as_hash so that it returns keys with empty values instead of a completely empty hash if there are no extracted values:

  def extracted_values_as_hash
    return {} unless extracted_values
    extracted_values.joins(:key).pluck('dataset_keys.title', :value).to_h
  end

garethrees added a commit that referenced this issue Mar 11, 2024
Project::Export uses the first request in the project to generate the
CSV headers. However, if the request hasn't had data extracted, the
headers for the dataset columns will be blank, even though other
requests in the project may have had a dataset extraction.

This commit ensures that headers are always present by looking up the
project's key set, and iterating through that to fetch the relevant
submission value for that key, or otherwise assigning a nil value.

This has the benefit of ensuring that the keys/values exported are
always in sync with the current project key set.

Fixes #8141.
garethrees added a commit that referenced this issue Mar 11, 2024
Project::Export uses the first request in the project to generate the
CSV headers. However, if the request hasn't had data extracted, the
headers for the dataset columns will be blank, even though other
requests in the project may have had a dataset extraction.

This commit ensures that headers are always present by looking up the
project's key set, and iterating through that to fetch the relevant
submission value for that key, or otherwise assigning a nil value.

This has the benefit of ensuring that the keys/values exported are
always in sync with the current project key set.

Fixes #8141.
alexander-griffen pushed a commit that referenced this issue Mar 22, 2024
Project::Export uses the first request in the project to generate the
CSV headers. However, if the request hasn't had data extracted, the
headers for the dataset columns will be blank, even though other
requests in the project may have had a dataset extraction.

This commit ensures that headers are always present by looking up the
project's key set, and iterating through that to fetch the relevant
submission value for that key, or otherwise assigning a nil value.

This has the benefit of ensuring that the keys/values exported are
always in sync with the current project key set.

Fixes #8141.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Breaks expected functionality f:projects x:uk
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant