Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Journalist Interface and API return different subsets of Sources #6667

Open
3 tasks
cfm opened this issue Nov 2, 2022 · 9 comments · May be fixed by #6764
Open
3 tasks

Journalist Interface and API return different subsets of Sources #6667

cfm opened this issue Nov 2, 2022 · 9 comments · May be fixed by #6764

Comments

@cfm
Copy link
Member

cfm commented Nov 2, 2022

Description

The Journalist Interface and API return different subsets of Sources. Under certain circumstances, the same journalist user will see more sources listed in a SecureDrop Client session than they will see in a simultaneous Journalist Interface session.

Recommended approach

  1. Reconcile (or factor out?) the filters used by the Journalist Interface's index() and the Journalist API's get_all_sources()
  2. Bonus: Check whether loaddata.py::record_source_interaction() can use session.commit() as usual
  3. Bonus: Tests!

Steps to Reproduce

On the Server: Not an exact reproduction; this is my long-running test instance.

sdadmin@app:~$ sudo apt-cache policy securedrop-app-code
securedrop-app-code:
  Installed: 2.5.0+focal
  Candidate: 2.5.0+focal
  Version table:
 *** 2.5.0+focal 500
        500 https://apt.freedom.press/ focal/main amd64 Packages
        100 /var/lib/dpkg/status

Previously:

www-data@app:~/securedrop$ ./loaddata.py --seed 1547 \
    --journalist-count 0 \
    --source-count 3500 \
    --messages-per-source 1 \
    --replies-per-source 1

Then I used the Journalist Interface to Select All sources and Delete Selected.

Screenshot from 2022-11-02 17-39-58

Expected Behavior

On the Client: No sources are listed.

Actual Behavior

On the Client:

user@sd-app:~/securedrop-client$ sudo apt-cache policy securedrop-client
securedrop-client:
  Installed: 0.8.1+bullseye
  Candidate: 0.8.1+bullseye
  Version table:
 *** 0.8.1+bullseye 500
        500 https://apt.freedom.press/ bullseye/main amd64 Packages
        100 /var/lib/dpkg/status

Screenshot_2022-11-02_10-44-12

Comments

On the Server:

sdadmin@app:~$ sudo -u www-data sqlite3 /var/lib/securedrop/db.sqlite "SELECT * FROM sources;"
1|eb2df246-c2d6-46dd-af81-83608411870f|3IGB6CI2DKHC2OGLJTSZYND6SOFGAQNMV2D5BCMXNR52PPV5UKXED5BCSMLKNN5M5UQTZPTB5VPJQBZMJYN34DXGE5GYIHMMOXZT7EQ=|politic paperweight||0|0|
255|db09f881-3b8f-4dec-bdf6-ab0bb715ea86|4PBQ3ZMPOJTKRRZZHGBCAX74GHOBINT6LHET3YOSO4Q77XGMMF3M5RZZVNYTO7DSEDUAWBUT36DOYCHM6O3QDMV7NVLYBPTJEO7HZEQ=|blissful obverse||0|0|
3756|b6280f51-ad5f-448c-8697-45e70cd417c5|IJNT2TRBQNT4ZHXSKRHCA44EAEPBOY76I3IXGQNBAGQVL2R7TZ5UHLAFLPUDM2A6UNTH2IUXWVMZ5WQHO7V3DX6BHZBROD2D5XWSTEA=|odds-on interruption||1|0|

And sure enough, compare:

  • The Journalist Interface filters on last_updated != None:

    # Query for sources without stars, along with their unread
    # submission counts.
    unstarred = (
    db.session.query(Source, unread_stmt.c.num_unread)
    .filter_by(pending=False, deleted_at=None)
    .filter(Source.last_updated.isnot(None))

  • The Journalist API does not:

    @api.route("/sources", methods=["GET"])
    def get_all_sources() -> Tuple[flask.Response, int]:
    sources = Source.query.filter_by(pending=False, deleted_at=None).all()

@cfm
Copy link
Member Author

cfm commented Nov 2, 2022

This is the kind of bug that motivates me to argue that the Journalist Interface should be a consumer of the Journalist API, whereas they're currently independent implementations of overlapping functionality.

@eloquence
Copy link
Member

eloquence commented Nov 3, 2022

Thanks @cfm, and agree that consuming the API in the JI would be a best practice (with my standard caveat about investing in an implementation that I hope we'll be able to deprecate with the SecureDrop Workstation). In your estimation, could this issue plausibly arise outside the development environment / loaddata use case, or is last_updated always set in those cases?

@cfm
Copy link
Member Author

cfm commented Nov 3, 2022

That's a fair question to raise, @eloquence, and in practice I think you're right (to imply :-) that this is a rare edge-case. In theory, however, I don't see how it's meaningfully more rare via loaddata.py...

source.pending = False
source.last_updated = datetime.datetime.utcnow()
db.session.flush()

...than via the Source Interface's /submit:

logged_in_source_in_db.pending = False
logged_in_source_in_db.last_updated = datetime.now(timezone.utc)
db.session.commit()

My concern here, however, is not really that this state is a possibility, since it's really just an edge case of the normal pending = False case. I'm more interested in the asymmetry between the Journalist Interface and the Journalist API, which—much as I'd love to take this as our cue to do a bit of defensive refactoring—I estimate to be a one-line fix. :-)

@eloquence
Copy link
Member

Thanks - I was mostly trying to understand the risk of production-level data integrity issues or user-facing impact.

In theory, however, I don't see how it's meaningfully more rare

If this summary is an accurate description of the flush/commit behavior, then it seems like loaddata may sometimes not persist the last_updated information if it doesn't reach a commit call. It's not clear to me why this particular method doesn't commit and only flushes.

But yes, completely agree they should also behave the same way in terms of the query.

@cfm
Copy link
Member Author

cfm commented Nov 4, 2022 via email

@audiodude
Copy link
Contributor

I'll work on this

@audiodude
Copy link
Contributor

Newbie question here, but how do I get to the application that is marked as "On the client" with this interface:

image

@audiodude
Copy link
Contributor

Sorry to post so many questions on such a simple task, but here goes. What is the preferred way of reconciling the Journalist Interface and the Journalist API? That is, should they both filter by last_updated is None or neither?

@eloquence
Copy link
Member

I think the preferred behavior would be for the API to also filter with .last_updated.isnot(None). These are not valid sources and should never be returned.

The interface in the screenshot is the SecureDrop Client, which you see at https://github.com/freedomofpress/securedrop-client - let us now if you have any difficulty setting it up, and thanks for working on this :)

nabla-c0d3 added a commit to nabla-c0d3/securedrop that referenced this issue Apr 21, 2023
@audiodude audiodude removed their assignment Aug 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants