Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Action-based interactionStatistics are implicitly counting only the '/object' entity of an Action; can we count '/agent' too? #2858

Closed
danbri opened this issue Mar 12, 2021 · 13 comments
Assignees
Labels
Queued for Editorial Work Editor needs to turn issues/PRs into final code and release notes.

Comments

@danbri
Copy link
Contributor

danbri commented Mar 12, 2021

This is feedback on InteractionStatistic from Google Search, based on our experience actively consuming the current markup from sites (e.g. see video-related docs ). Google has been investigating ways to make more use of this and encountered this issue.

For example, if this markup was attached to a VideoObject,

        "@type": "InteractionCounter",
        "interactionType": { "@type": "http://schema.org/WatchAction" },
        "userInteractionCount": 5647018,
  },

... we are counting the number of Watch Actions in which the entity it it attached to (the video) is the '/object' of the Action.

Can we find a model to make this more explicit (in docs and data), and extend it to allow '/agent' statistics to be published this way too? For example AskAction, FollowAction, ReplyAction, ... potentially both numbers are interesting.

For example if we added these ...:

        "userInteractionObjectCount": 5647018,
        "userInteractionAgentCount": 2234

where userInteractionObjectCount is an alias, or subproperty, or eventual replacement for, userInteractionCount; and userInteractionAgentCount means the same but counting agents instead of objects. Potentially this could also be extended for other properties of the Agent.

@danbri danbri self-assigned this Mar 12, 2021
@hartmannr76
Copy link

hartmannr76 commented Mar 19, 2021

How does adding the additional field alter the interpretation of new fields on InteractionCounter? If other proposals (e.g. #2825) are to be accepted, it seems like there would be no way to differentiate how the start/end time are to be interpreted (i.e. on the agent or object).

Some other options I see are:

  1. Continue with this direction, but other fields may need to include the same identifier of object or agent in the name
  2. Create new Action types that identify the direction (FollowAction vs something like FolloweeAction) - my general concern with this approach would actually be that we would need to keep adding new actions to indicate the other direction
  3. Create a new property on InteractionCounter as an enum to indicate if it applies to the agent or object
"@type": "InteractionCounter",
"interactionType": { "@type": "http://schema.org/FollowAction" },
"userInteractionCount": 5647018,
"appliesTo": "Agent" // or "Object"
  1. Subtype InteractionCounter with AgentInteractionCounter and ObjectInteractionCounter to allow the type to specify the direction the value is applied on

The biggest problem I see with 2, 3 and 4 is that it makes the output slightly more verbose as there will be more repeated text

"interactionStatistic": [ 
{
  "@type": "InteractionCounter",
  "interactionType": { "@type": "http://schema.org/FollowAction" },
  "userInteractionCount": 2234,
  "appliesTo": "Agent"
},
{
  "@type": "InteractionCounter",
  "interactionType": { "@type": "http://schema.org/FollowAction" },
  "userInteractionCount": 5647018,
  "appliesTo": "Object"
}
]

However, it allows other extended fields to be applied directly to the target action. Options 3 and 4 give us the added benefit of reusing the existing Action's which makes me slightly more partial towards them

@danbri
Copy link
Contributor Author

danbri commented Mar 22, 2021

@hartmannr76 and I just spoke about this.

One additional point to track is that

"interactionType": { "@type": "http://schema.org/FollowAction" },

... is not the current model. The interactionType property's value should be the type itself, and not a thing that has a "type" relationship to the type (i.e. the value of interactionType is the FollowAction type itself, not an FollowAction instance.

Aside: I see today that Google's docs use the interactionType property in the latter style. I don't believe this was the original intention, and will investigate.

Being clear about this may help us work through what it means to want to have properties applicable to actions (start/end times, locations) be mentioned in the context of an InterActionCounter description.

We should also note that there's an under-articulated assumption that an InteractionCounter is an InteractionCounter with respect to a particular site or dataset or similar context, and unlikely to be an attempt at global truth - i.e. the "interactionService". Nobody knows how many people have seen e.g. Ghostbusters the movie, but you might expect maybe Netflix, YouTube etc to have some numbers for a particular representation of that movie within their systems. Similarly with social network follow/friend/like stuff.

It also implicitly rests on the idea that the counting-platform has some clearish notion of entity identity such that things can be counted, and in general that the underlying data is not explicitly "in schema.org".

@hartmannr76 and I also discussed the difficulty with "appliesTo" and subtyping here, in that both would lean heavily on defaulting which is particularly challenging in open flexible formats.

There's an underlying goal here of picking out a subset of all actions, in which they

  • are counted as having happened on some Interaction-Counting platform - the /interactionService
  • may or may not be restricted by time/space in which they occured
  • Once we have a set of actions, we can report on the number of distinct users/agents that have participated in that set of actions

For a similar but different situation see http://schema.org/ProductGroup which provides a place in which a kind of prototypical "Product" can stand in for several specifics kinds of products (which in turn are still not usually the actual products in your hand, just more specific). E.g. a certain t-shirt design vs that design 'in XL and black" vs a particular instance of that design that you're buying or own.

The difficulties here are around dealing with informal templating structures that pick out sets of actions, and doing so in a representation that is intended to not radically change its meaning when partial descriptions are provided. Or at least for it to be possible to determine when insufficient information is available for the data to be useful. On ProductGroup the variesBy property helps here; you can describe a set of products that variesBy their color, i.e. that the color property isn't merely missing on some ProductGroup. Similarly on https://schema.org/StatisticalPopulation + https://schema.org/Observation the population has a "constrainingProperty" (examples in #2291), so we can talk about a population "of Persons with a homeLocation of East_Podunk_California'.

Next steps - @rvguha - can we talk this through? Is convergence between InteractionCounter and the statistical aggregates vocabulary feasible? /cc @vholland

@github-actions github-actions bot added the no-issue-activity Discuss has gone quiet. Auto-tagging to encourage people to re-engage with the issue (or close it!). label May 22, 2021
@rrlevering
Copy link
Contributor

I feel like this thread sort of got into the weeds with revamping the whole statistics stuff and lost track of the need. This is becoming evident on the web to us where sites are sticking invalid schema in places to represent things that cannot be represented easily in schema.org. For instance, quora currently uses schema.org/followerCount (which is not a thing) and instagram currently skips it and only represents incoming stats.

I have a different proposal which I think is simpler. interactionStatistics currently implies an incoming direction on the interaction (based on the wording and most usage). I propose a different property that represents an outgoing direction with the domain of Person and Organization and a range of InteractionCounter. These would be for "engagement statistics" in the social media sense. I don't really care what it's called but "engagementStatistics", "agentStatistics", etc. I think this is slightly easier to comprehend than modifying InteractionCounter and the range/domain helps constrain its meaning to only types that have agency.

@schemaorg schemaorg deleted a comment from github-actions bot Mar 29, 2023
@danbri
Copy link
Contributor Author

danbri commented Mar 29, 2023

Ok, just met with @rrlevering to go over this. A lot of it is just textual changes. Schema.org has a pile of overlapping markup patterns which grew over time, and sometimes the changes made at Schema.org to support this didn't update all the relevant textual definitions. The draft below is verbose because it's good to be explicit; I hope it could be made more reader-friendly.

Background

In https://schema.org/InteractionCounter we have currently:

A summary of how users have interacted with this CreativeWork. In most cases, authors will use a subtype to specify the specific type of interaction.

This implicitly assumes some (not necessarily all) properties referencing the counter are fixing the choice of entity being counted, most obviously by being properties of that entity. We should make clear:

  • that this goes beyond CreativeWork already (Organization and Person are anticipated types expecting interactionStatistic). So "how users have interacted with this CreativeWork" is too specific.
  • that some context is needed if we want to be clear about what is being counted. To understand what is counted we need to know at least:
    • which specific thing (a creative work, or a person or organization, for example) is being counted, the specific kind of Action they are involved in, and
    • which of the various roles an Action has (potential candidates being: "agent", "object", "instrument", "location", "participant", "provider", "result"... or subproperties of these e.g. "lender", "borrower")
    • also any additional subsetting or filtering (e.g. "followers" meaning "followers on this site"; or actions known to this site). This is at least partially addressed by "interactionService".

Proposed Changes

Change 1: New text for InteractionCounter

proposal for new text defining [[InteractionCounter]] (to be wordsmithed and whitespaced):

  • OLD: A summary of how users have interacted with this CreativeWork. In most cases, authors will use a subtype to specify the specific type of interaction.
  • NEW: A quantitative summary (typically a count) of action occurrences in which a particular entity places a particular role in a set of actions, within some specified scope (such as a site or application). The [[interactionStatistic]] property, for example, can be used on an entity to provide statistics counting actions (whose nature is indicated with [[interactionType]]), where those actions occur or are recorded in a scope or context indicated by [[interactionService]]. When this scope is not explicitly provided, it must be assumed from other contextual information for the statistics to be meaningful. Actions in schema.org have a number of roles, principally "agent", and "object" for the do-er and the done-to entities. The count is of the number of times there has been an event of the specified type, in which the entity we're concerned with, has been in the appropriate role. Some but not all properties referencing an [[InteractionCounter]] provide this information. For example, with [[interactionStatistic]], the item it is used on is the item being counted, and the count is of the number of times that item plays the "object" role in occurrences of the indicated action type (including subtypes). Similarly, the [[agentInteractionStatistic]] property works identically, except that the count is of actions where the count concerns the "agent" role. Potentially other constructions could count other roles, or more complex patterns, but the basic framework will be items / actions / scope as outlined here.

(ignore the chunky brackets, but when used in our site definition files they're cross-links to related terms)

Change 2: New text for interactionStatistic

Currently https://schema.org/interactionStatistic says:

OLD: The number of interactions for the CreativeWork using the WebSite or SoftwareApplication. The most specific child type of InteractionCounter should be used.

Revised/proposed interactionStatistic:

NEW: The number of interactions for this entity, in a particular role (the '[[object]]'), in a particular action (indicated in the statistic), and in a particular context (i.e. [[interactionService]]). The value is an [[InteractionStatistic]]) which conveys both the quantitative information, as well as these other pieces of supporting data.

Then we can do the same for a new agentInteractionStatistic property:

NEW: The number of interactions for this entity, in a particular role (the '[[agent]]'), in a particular action (indicated in the statistic), and in a particular context (i.e. [[interactionService]]). The value is an [[InteractionStatistic]]) which conveys both the quantitative information, as well as these other pieces of supporting data.

Change 3: Non-textual tweaks

  • add class to rangeIncludes of interactionType
  • add an agentInteractionStatistic property as implied by the revised text.

@danbri danbri added Queued for Editorial Work Editor needs to turn issues/PRs into final code and release notes. and removed no-issue-activity Discuss has gone quiet. Auto-tagging to encourage people to re-engage with the issue (or close it!). labels Mar 29, 2023
@danbri
Copy link
Contributor Author

danbri commented Mar 29, 2023

If a site/app is actually counting the final states of the world that the actions lead to (e.g. follower count vs number of followers), and if we care enough, we could add a boolean to the counter to indicate this. It's a pedantic point but it could cause problems if sites take action-counting literally, whereas others read it as an indicator of the situation caused by the action. Or maybe we can soften the wording a little? We're pretty far down this route already (1000s of sites)...

@rrlevering
Copy link
Contributor

I don't think you want to go down that road, even if it's more accurate. I think it points toward not using actions at all which would be massively disruptive. We can represent follower count as the count of follow actions that have not been "taken back"/unfollowed rather than a state/sum of them.

@danbri
Copy link
Contributor Author

danbri commented Mar 29, 2023 via email

@rrlevering
Copy link
Contributor

Another idea that occurred to me later in this discussion is that we could use the instance fields on the Action itself for this purpose. For instance:

"@id": "person-foo",
"interactionStatistic": [
{
  "@type": "InteractionCounter",
  "interactionType": { "@type": "http://schema.org/FollowAction", "agent": { "@id": "person-foo" }},
  "userInteractionCount": 2234,
},
{
  "@type": "InteractionCounter",
  "interactionType": { "@type": "http://schema.org/FollowAction", "object": { "@id": "person-foo" }},
  "userInteractionCount": 2234,
}

That would explicitly indicate the directionality of the action (note followee on FollowAction which is also redundant doesn't really help because we need agent OR object). However, this style would essentially mandate a cycle in your markup graph. And it nests the directionality fairly deep in the schema. So I think I'm still in favor of the new predicate to make authoring easier.

@danbri
Copy link
Contributor Author

danbri commented Apr 5, 2023

@rrlevering yes - that's where I was going in my post too:

In theory the “interactionType”: { @.*** “FollowAction” …} representation
gives an place where other detail could in theory be stored

We could say also in the definition for interactionType, "The exact nature of the count criteria will vary site-by-site. Although it is indicated by a schema.org action it is best not to assume the number is an exact number of action occurrences. For example, a site could reasonably assume to give a follower count, and count only follow actions that haven't been retracted. Handling of blocking, deleted accounts, hacked accounts etc. will vary too." or something in that vein, to decouple the actions mechanism a bit from the reality of these stats.

gmackenz added a commit to gmackenz/schemaorg that referenced this issue Sep 12, 2023
RE: issue schemaorg#2858 

Adding agentInteractionStatistic to capture user interaction such as on social media posting commenting.
@danbri danbri closed this as completed in dca27ac Oct 17, 2023
danbri added a commit that referenced this issue Oct 17, 2023
@Tiggerito
Copy link

A question in the Search Central Community had me thinking.

There is mention of context for the InteractionCounter that can be explicitly stated with interactionService to be a specific WebSite or SoftwareApplication. So you could say the count is for that object/actor within a specific website/app. Like say Twitter.

What happens if interactionService is not provided? Is the context the current website?

If we look at an example in the Google docs that includes agentInteractionStatistic.

This has a DiscussionForumPosting with an author with agentInteractionStatistic indicating 8 WriteActions. Would this be 8 posts on the website/app by that person?

DiscussionForumPosting also has comments. Each Comment has an author with agentInteractionStatistic indicating WriteAction. Should these be CommentAction? And the count would be for all comments by that person on the website/app.

Would it make sense to include both WriteAction and CommentAction for a person that both posts and comments? And LikeAction etc.

The Google documentation and examples are a bit confusing, with agentInteractionStatistic and WriteAction only showing up in the examples. While CommentAction was listed as an option but not used in the examples.

@rrlevering
Copy link
Contributor

That is correct, I think the "universe" of most things are the data source where it is derived. For instance, the author (defined by a profile/URL on the website) is common across all the pages the author is on. So the default context for the interaction counter for a WriteAction would be the overall website. Not, for instance, that specific thread. The data on that author could be a subset of the data on a profile page, but it should be able to be interpreted the same way on both pages.

The nuances of the types are tricky. Like on some social media platforms, a post and a comment become very similar. In general, I personally see WriteAction as more generic than CommentAction rather than WriteAction = OP and CommentAction = Reponse to OP in semantics but in practice that's probably how it will be used to differentiate.

The Google documentation references profile page on purpose for author markup because it's trying to eventually normalize creator markup across multiple different types of authored content (DFP, Q&A, Article, Video, etc.) We're trying not to repeat it on every page.

@Tiggerito
Copy link

Ah, I checked out the author/article link that said nothing about it. I did not check out the profile page one. Doh!

I'm getting my head around it now. the Person on the profile page can also have interactionStatistic entries, where they are the object of the count. e.g. Who liked them.

Thanks.

@rrlevering
Copy link
Contributor

rrlevering commented Dec 29, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Queued for Editorial Work Editor needs to turn issues/PRs into final code and release notes.
Projects
None yet
Development

No branches or pull requests

5 participants
@danbri @rrlevering @hartmannr76 @Tiggerito and others