Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary string distance calculation when using @Field [DATAMONGO-1991] #2862

Open
spring-projects-issues opened this issue Jun 1, 2018 · 7 comments
Assignees
Labels
status: waiting-for-triage An issue we've not yet triaged type: enhancement A general enhancement

Comments

@spring-projects-issues
Copy link

Ludek Novotny opened DATAMONGO-1991 and commented

When @Field annotation is used to have different Mongo Document field name than bean field name, string distance (org.springframework.beans.PropertyMatches.calculateStringDistance) is calculated for all combinations of fields. If the distance is too big, name from annotation is used as fallback.

This causes a big performance hit in our application. Our solution was to implement cache in PropertyMatches but more permanent solution would be appreciated as we don't really want to maintain our version of spring-beans. We also believe the cache isn't the best solution. Is there a reason why field name from @Field isn't used with highest priority and string distance would be fallback?

This issue is somehow related to BATCH-1876. But our use case is with Mongo. Our application is running on Spring Boot 2.0.0-RELEASE


Reference URL: https://jira.spring.io/browse/BATCH-1876

@spring-projects-issues
Copy link
Author

Oliver Drotbohm commented

Can you please clarify in how far this affects Spring Data MongoDB? Given the information provided so far, I don't see any connection here.

I briefly checked and we only use PropertyMatches in case we cannot resolve a PropertyPath and have to prepare an exception message that's helpful. The use of @Field alone doesn't actually trigger that calculation

@spring-projects-issues
Copy link
Author

Ludek Novotny commented

Yes, it's not caused by @Field alone. The calculation of string distance happens when exception PropertyReferenceException is being created. But the exception is caught in QueryMapper.getPath and null is returned. Yet the bean field is still correctly mapped probably using the name provided in @Field. So the distance in this case was calculated unnecessarily

@spring-projects-issues
Copy link
Author

Oliver Drotbohm commented

Can you please provide more information on what code you're executing? It feels unusual that you have code that's triggering that exception repeatedly

@spring-projects-issues
Copy link
Author

Ludek Novotny commented

This is the simplified example. We use an account entity which has around 70 fields, most of them annotated with @Field. Some fields are separate entities which can have 5-10 other annotated fields. We use @Field quite a lot.

import lombok.Data;
import org.springframework.data.mongodb.core.mapping.Document;
import org.springframework.data.mongodb.core.mapping.Field;

@Document(collection = "accounts")
@Data
public class AccountTest {    
   @Field("field1")
   private String fieldOne;    
}

Each entity is loaded from db and processed. We have to process millions unique entities several times and in process, we generate new ones. Let's say that in total, we have to load 100.000.000 entities from db. And this is the test we used to debug it. The first query doesn't trigger exception because we use field name. The second query triggers string distance calculation.

@RunWith(SpringRunner.class)
@SpringBootTest
public class FieldTest {
    
   @Autowired
   private MongoTemplate template;   

   @Before
   public void setup(){
       template.dropCollection("accounts");
   }
   
   @Test
   public void test(){
       AccountTest account = new AccountTest();
       account.setFieldOne("123");       
       template.save(account);

       List<AccountTest> list = template.find(Query.query(Criteria.where("fieldOne").is("123")), AccountTest.class);
       List<AccountTest> list2 = template.find(Query.query(Criteria.where("field1").is("123")), AccountTest.class);       
   }
}

@spring-projects-issues
Copy link
Author

Oliver Drotbohm commented

Thanks for the detailed writeup, Ludek. I have a couple of follow-up questions:

  1. Why is anyone actually issuing the second query in the first place. Nobody should. If you use the Criteria API refer to property names, always.
  2. Are you triggering said query 100.000.000 times? If so, why?
  3. If you have a use-case that's reading that amount of data, have you considered skipping the object-to-document layer completely and rather use the CollectionCallback etc

@spring-projects-issues
Copy link
Author

Ludek Novotny commented

Oh, this test doesn't represent our backend, it's just something we put together to debug and identify the sequence of calls which leads to string distance calculation. Maybe I should have posted here the actual backend at the first place. Sorry about that. So this is how it actually works:

We get the Stream<Document> of all documents to be processed. One matching criteria batchId is used to identify a set of documents.

MongoTemplate template;
.......
MongoDatabase db = template.getDb();
MongoCollection<Document> collection = db.getCollection("account");
FindIterable<Document> cursor = collection.find(new BasicDBObject("batchId", batchId));
return StreamSupport.stream(cursor.spliterator(), false);

Each document from stream is then converted to account. We don't have custom Bson2Account converter. It relies only on Spring and driver.

MongoTemplate template;
.....
public Account ConvertBsonDocument2Account(Document accountObj) {
    return template.getConverter().read(Account.class, accountObj);
}

When the conversion happens, the distance is calculated

@spring-projects-issues spring-projects-issues added status: waiting-for-feedback We need additional information before we can continue type: enhancement A general enhancement labels Dec 30, 2020
@spring-projects-issues
Copy link
Author

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

@spring-projects-issues spring-projects-issues added the status: feedback-reminder We've sent a reminder that we need additional information before we can continue label Jan 6, 2021
@christophstrobl christophstrobl added status: waiting-for-triage An issue we've not yet triaged and removed status: feedback-reminder We've sent a reminder that we need additional information before we can continue status: waiting-for-feedback We need additional information before we can continue labels Jan 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: waiting-for-triage An issue we've not yet triaged type: enhancement A general enhancement
Projects
None yet
Development

No branches or pull requests

3 participants