Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve pool performance #480

Merged
merged 21 commits into from Dec 13, 2018

Conversation

Projects
None yet
@saml1er
Copy link
Member

saml1er commented Sep 18, 2018

The code that is currently running MTA was designed to push GTA SA to its limits and make full use of its features, but what if we want to take things to the next level and implement features like unlimited streaming memory or, even better, custom physics where we allow scripters to take full control of how objects behave in the 3D world and their response to collisions using a 3D library like bulletphysics.

These things are nice to have but they come with a price and here, the price is performance. I'm working on some new features for MTA where I'm placing hooks on certain parts of code which are executed every frame per second. Normally, when we place a hook we need to get an equivalent class for it which are classes with SA suffix in their name and, of course, CClientEntity.

Problem

Just take a look at this code snippet:

bool CClientGame::ObjectBreakHandler(CObjectSAInterface* pObjectInterface, CEntitySAInterface* pAttackerInterface)
{
    if (pObjectInterface)
    {
        // Get our object and client object
        CObject*       pObject = g_pGame->GetPools()->GetObjectA((DWORD*)pObjectInterface);
        CClientObject* pClientObject = m_pManager->GetObjectManager()->GetSafe(pObject);

        // Is our client vehicle valid?
        if (pClientObject)
        {
            if (!pClientObject->IsBreakable(false))
                return false;

            // ... //
            // ... //
            // ... //
        }
    }
    return true;
}

This is a handler function which is called when a object has breaking properties and it gets damaged, so let's try to break down what's happening in the snippet:

First the variabel pObjectInterface is checked for null value, then the code proceeds to getting the object, now this is where interesting stuff happens. MTA is mostly using hash maps for storing data (not from C++) which takes O(1) time complexity for lookup and O(n) in worst cases according to some sources. This is completely fine but it doesn't stop there. It then goes on to verifying the object by calling GetSafe method of the object which requires another lookup for getting CClientEntity* instance for this object. The problem with CClientEntity* lookup is that it calls g_pClientGame->GetGameEntityXRefManager()->FindClientObject function which will take more time to search if you more objects. Even though I feel that it can be very easily optimized, it is negligible. But when you are hooking multiple rendering functions, that's when the problem starts.

Now, I'm pretty sure you've figured out by now where I'm getting with this, so I'll get to the point. I was looking at how MTA streams in and out elements, I realized that you can only have 140 max peds streamed in, actually it was less than 140 but anyway, that's the maximum amount MTA allows currently. If you we have more peds on one location, MTA will stream out the ones farther away and stream in the ones that are closer to the camera. Matter of fact, it will even stream out peds that are not within the camera's frustum, while this is happening, the interface of the object (ped) within the pool gets created and destroyed created every time the player streams in and out, respectively to make sure the count never exceeds 140 as it's the limit. This only applies to the pool object for ped, MTA keeps the CClientEntity* object for that ped within CGameEntityXRefManagerImpl::m_GameToClientMap.

Now, let's take a look at CPedPoolSA class:

// Generic container for pools
template <class T, class I, unsigned long MAX>
struct SPoolData
{
    typedef CFastHashMap<I*, T*> mapType;
    mapType                      map;
    T*                           array[MAX];
    unsigned long                ulCount;

private:
    friend class CPoolsSA;

    SPoolData() : map(MAX), ulCount(0UL)
    {
        for (unsigned int i = 0; i < MAX; ++i)
        {
            array[i] = NULL;
        }
    }
};

// Pools
typedef SPoolData<CVehicleSA, CVehicleSAInterface, MAX_VEHICLES> vehiclePool_t;
typedef SPoolData<CPedSA, CPedSAInterface, MAX_PEDS>             pedPool_t;
typedef SPoolData<CObjectSA, CObjectSAInterface, MAX_OBJECTS>    objectPool_t;
vehiclePool_t                                                    m_vehiclePool;
pedPool_t                                                        m_pedPool;
objectPool_t                                                     m_objectPool;

We have three very important memory pools:

Here, ulCount is the number of elements streamed in within the pool. array contains the interface pointers (pool object) and map contains the SA classes (MTA classes to be accurate) as value so we can search for them via their pool interfaces.

Solution

How about eliminating array and map members from CPoolSA class and also eliminating containers like CGameEntityXRefManagerImpl::m_GameToClientMap and accessing the value from only ONE array by an index without looping through it? So basically, we'll have only ONE array which will contain all the data we need, we only access it once by index instead of looping through it which means it will have O(1) time complexity with lil bit of shifting, no pun intended 馃槈. This is exactly why I created this PR to show you what MTA is capable of.

馃敶 NOTE: The code is in its very basic form so people can understand how I plan to do things. I've implemented the fast lookup for ped pool only. We can use the exact same technique for other pools since they have the same GTA pool class.

This is what the new CPoolSA looks like:

template <class T>
struct SClientEntity
{
    T* pEntity;  // Game SA class of MTA
    CClientEntity* pClientEntity;
};

template <class T, class I,unsigned long MAX>
struct SPoolData
{
    std::array <SClientEntity<T>, MAX> arrayOfClientEntities;
    unsigned long                ulCount;

private:
    friend class CPoolsSA;

    SPoolData() : map(MAX), ulCount(0UL)
    {
        for (unsigned int i = 0; i < MAX; ++i)
        {
            arrayOfClientEntities[i] = { nullptr, nullptr };
        }
    }
};

typedef SPoolData<CVehicleSA, CVehicleSAInterface, MAX_VEHICLES> vehiclePool_t;
typedef SPoolData<CPedSA, CPedSAInterface, MAX_PEDS>             pedPool_t;
typedef SPoolData<CObjectSA, CObjectSAInterface, MAX_OBJECTS>    objectPool_t;
vehiclePool_t                                                    m_vehiclePool;
pedPool_t                                                        m_pedPool;
objectPool_t                                                     m_objectPool;

arrayOfClientEntities is an member of SPoolData with size set to the max number of elements it can take. The size is 140 for ped pool, so the index is always going to be between 0-139.

When we have an interface to the ped object within the pool, we can get its index within the pool which is going to be within range of 0-139 which is exactly what we need for m_pedPool.arrayOfClientEntities.

CPedSAInterface * pInterface = pPed->GetPedInterface();

DWORD dwPedRef = GetPedRef((DWORD*)pInterface);

// Extract the element index from the handle  
DWORD dwElementIndexInPool = dwPedRef >> 8;

Once we have the index, we can use it to access both Game SA class and CClientEntity* by simply:

// Game SA object
CPedSA* pPed = m_pedPool.arrayOfClientEntities[dwElementIndexInPool].pEntity;

// CClientEntity* object
CClientPed * pClientPed = (CClientPed*)m_pedPool.arrayOfClientEntities[dwElementIndexInPool].pClientEntity;

So we don't even need to search one container for the object we need, we simply use the index.

This is how the GetPedRef function looks like:

int __cdecl CPools::GetPedRef(CPed *pPed)
{
  return (((char *)pPed - (char *)CPools::ms_pPedPool->m_pObjects) / 1988 << 8)
       + CPools::ms_pPedPool->m_byteMap[((char *)pPed - (char *)CPools::ms_pPedPool->m_pObjects) / 1988];
}

I used m_pedPool.arrayOfClientEntities in CPedPoolSA.cpp and eliminated the use ofmap and array members for ped pool.

One more thing, internally RenderWare is using a pool for RpClump. We can do the same thing there.

New features we can implement without losing good performance

  • Unlimited streaming memory with full lods
  • Custom physics which is a much more detailed topic but I'll list it here.
  • Adding animation support for custom objects that can have any N number of bones.
  • Improving performance bottlenecks of current asm hooks
  • Making certain functions object dependent instead of model dependent like #341
    Properly upgrading to DX11 which support XP SP3. We can use its instancing feature which will allow us to render thousands of objects without losing any FPS. The DX update is unofficial for XP so nope.

Also, the PR works. 鉁旓笍

Implement fast index based lookup for ped pool.
The code is tested and works nicely without breaking anything. This technique allows us to eliminate std::map and looping through arrays by using the index of the object pointer within the pool. Next step is to do the same for other pools.  This will give us a decent performance boost.

@saml1er saml1er changed the title Implement fast index based lookup for ped pool. WIP: Improve pool performance Sep 18, 2018

@saml1er saml1er added the enhancement label Sep 18, 2018

@saml1er saml1er added this to the 1.6 milestone Sep 18, 2018

@saml1er saml1er self-assigned this Sep 18, 2018

@ccw808

This comment has been minimized.

Copy link
Member

ccw808 commented Sep 18, 2018

Excellent work

@saml1er

This comment has been minimized.

Copy link
Member Author

saml1er commented Sep 18, 2018

Excellent work

Thank you. :)

@Einheit-101

This comment has been minimized.

Copy link

Einheit-101 commented Sep 18, 2018

I appreciate this work, but the lua functions with worst performance are processLineOfSight, isLineOfSightClear and setElementData (as far as i know). Another serious performance issue is ped collisions, many peds streamed in with enabled collisions will melt any PC. Would any of these issues benefit from this?

@saml1er

This comment has been minimized.

Copy link
Member Author

saml1er commented Sep 18, 2018

I appreciate this work, but the lua functions with worst performance are processLineOfSight, isLineOfSightClear

processLineOfSight and isLineOfSightClear are not the type of functions you can call every frame. They eat a lot of CPU because they need to do calculations which you cannot avoid but we can most probably rewrite these functions in C++ using SSE SMID instructions which is very good for doubles and floating points. It does not guarantee that it won't give you lag spikes if you execute it multiple times every frame.

The optimizations that can be done in C++, can also be done in Lua for these functions. Before executing them, you can check for player distance and know whether an element is on screen or not and use the same information across all resource. Do not call these functions in multiple resources.

Another serious performance issue is ped collisions, many peds streamed in with enabled collisions will melt any PC. Would any of these issues benefit from this?

Yes, collisions and all GTA events that MTA has hooked. MTA does some processing before their GTA functionality executes and it tries to search for the equivalent class object that was created for every entity that you create via Lua script. Collisions are the ones suffering the most from this. I've already explained how this one is fast because it does not require us to loop through an array to search for the class object, it calculates it instantly by getting the index from the address so we do don't have to execute the loop. Currently, MTA will do this at least 2 times for every single entity to check which one needs collision, and this is why people with weak computers have to suffer with lag. You'll notice the performance difference when you see it.

@Einheit-101

This comment has been minimized.

Copy link

Einheit-101 commented Sep 19, 2018

processLineOfSight and isLineOfSightClear are not the type of functions you can call every frame. They eat a lot of CPU because they need to do calculations which you cannot avoid but we can most probably rewrite these functions in C++ using SSE SMID instructions which is very good for doubles and floating points. It does not guarantee that it won't give you lag spikes if you execute it multiple times every frame.

Even a slight performance increase would be significant in some cases.

You'll notice the performance difference when you see it.

I will test my performance before and after it has been updated with 100 Peds and i will notice that indeed :D

@AlexTMjugador

This comment has been minimized.

Copy link

AlexTMjugador commented Sep 20, 2018

Maybe this is a stupid question, and I'm sorry if that's the case, but I'm curious about its answer 鈥 how do you guarantee that hash collisions won't occur; that is, that two different objects won't compute a same index for the pool array? Does GetPedRef have some property that guarantees that two different objects won't have the highest bits in common, or are collisions handled somewhere else?

@saml1er

This comment has been minimized.

Copy link
Member Author

saml1er commented Sep 20, 2018

Does GetPedRef have some property that guarantees that two different objects won't have the highest bits in common, or are collisions handled somewhere else?

GetPedRef simply calculates the index of the object within the pool which is unique for every single ped object aka interface within ped pool as we are only using the pointer address of the interface. The size of ped pool is always 140 maximum. Have you ever seen more than 140 peds on screen even if you created them in one location? That simply won't happen because GTA will stream out peds farther away to keep the count to 140. The reason why there's no collision is because the object is deleted when it streams out and recreated when it streams in if there's space in pool.

Let's say if you have maximum amount (140) of peds streamed in, when you move in map, and there's a ped behind the camera which streams out, which makes the interface count to reach 139, so if any other ped object streams in as you move in map, it will take place of the streamed out object making the count reach back to 140. If you go to a place where there are no peds but only your local player, the count will remain 1.

Here's much more simple and better code to get the index (removed useless shifting):

DWORD CPoolsSA::GetPedPoolIndex(std::uint8_t* pInterface)
{
    DWORD dwAlignedSize = 1988;
    return ((pInterface - (std::uint8_t*)(*pPedPool)->m_pObjects) / dwAlignedSize);
}

Usage:

DWORD dwElementIndexInPool = GetPedIndexInPool (pInterface);

// Game SA object
CPedSA* pPed = m_pedPool.arrayOfClientEntities[dwElementIndexInPool].pEntity;

If you would like to know more, this is how CPoolSAInterface looks like:

// size of tPoolObjectFlags is 1 byte only
union tPoolObjectFlags {
    struct {
        unsigned char nId : 7;
        bool bEmpty : 1;
    };
private:
    unsigned char nValue;
};

template<class A, class B = A> class CPoolSAInterface {
public:
    // m_pObjects contains all interfaces. 140 maximum for ped objects.
    B                *m_pObjects;
    tPoolObjectFlags *m_byteMap;
    int               m_nSize;
    int               m_nFirstFree;
    bool              m_bOwnsAllocations;
    bool field_11;

    // Default constructor for statically allocated pools
    CPool() {
        // Remember to call CPool::Init to fill in the fields!
        m_pObjects = nullptr;
        m_byteMap = nullptr;
        m_nSize = 0;
        m_bOwnsAllocations = false;
    }
};

CPoolSAInterface < CPedSAInterface > pedPool;
CPoolSAInterface < CObjectSAInterface > objectPool;
CPoolSAInterface < CVehicleSAInterface > vehiclePool;
@AlexTMjugador

This comment has been minimized.

Copy link

AlexTMjugador commented Sep 20, 2018

That explanation and code snippets clarified things for me, indeed. Thanks! 馃槃

codenulls added some commits Sep 22, 2018

Fix crashes, bugs, and formatting
Some code was written by lucy.x and some was written by me. I combined them in one commit to fix the formatting.
@Pirulax

This comment has been minimized.

Copy link
Contributor

Pirulax commented Sep 29, 2018

Maybe the same idea(or at least something like this) could be implemented for element data search too, because the current way (get)element data works is it loops thru a table, and gets the data with the name.
But, for example setElementData first does a getElementData to get the current bSync, and then it does whats it supposed to.

@Einheit-101

This comment has been minimized.

Copy link

Einheit-101 commented Sep 30, 2018

Performance measurements on my side have shown that getElementData is extremely fast, like, immeasurable fast. Only setElementData is slow like hell.

@CodyJL

This comment has been minimized.

Copy link

CodyJL commented Oct 15, 2018

I just stumbled a pound this; why hold back some incredibly useful features just because of XP? I understand a small chunk of the userbase uses it but it makes no sense to not do something as important as that just because support for them should be kept.

That DX 11 feature could take mapping to the next level in MTA.

@LosFaul

This comment has been minimized.

Copy link

LosFaul commented Oct 15, 2018

Are there any statistics about how much players uses which OS?

@Dutchman101 maybe?

@Woovie

This comment has been minimized.

Copy link

Woovie commented Oct 15, 2018

Any user on Windows 7 and forward should support DirectX 11. 99% of the playerbase is on Windows version 6.1 (The same as 7) and forward, so I would argue it's a safe bet for DirectX 11.

@Woovie

This comment has been minimized.

Copy link

Woovie commented Oct 15, 2018

This is of course also dependent on their GPU. We have a lot of Intel GPU users.

@AlexTMjugador

This comment has been minimized.

Copy link

AlexTMjugador commented Oct 15, 2018

I'm a bit lost right now. Why are the last comments assuming this PR will, by itself, implement unlimited streaming or other fancy features that require DX11? In its current state, as I see it, it only eases up the terrain for future improvements on that matter, providing performance benefits on the process. So right now it's good for both XP, slow PC users and beefy PC users alike, because nothing on it implements new features. Therefore, it makes no sense to hold it back because people is still using Windows XP.

However, I'd understand that doing this for just the sake of performance, without future plans on doing things thanks to it, might not be the best way to invest the precious programmer time. Hopefully, there are other ways to improve performance more notably and easily.

@CodyJL

This comment has been minimized.

Copy link

CodyJL commented Oct 17, 2018

Reading the general ideas that saml1er has it seemed he planned to add DX11 support. This would allow true infinite streaming without absolutely crushing performance. There has to be a time where XP support is dropped because all it is doing now is bogging down development due to the fact that everything has to support xp and 7+ which is an incredibly limiting factor.

@AlexTMjugador

This comment has been minimized.

Copy link

AlexTMjugador commented Oct 17, 2018

I personally agree with the fact that XP support should not hamper the development of features, @CodyJL. My point is that right now this PR is not a "victim" of XP obsolescence. We don't need to blame people that uses that OS yet, and I think that the red warning that shows up in the main menu to users using the XP build does it job pretty well.

When the time comes to merge that fancy unlimited streaming features which require DX11, then we'd have another technical reason to not support Windows XP anymore, because it would not be possible to. But until then, if people still uses the XP build even though CEF is not supported on XP anymore... Well, I hope they really know what they're doing, but giving people freedom to use the OS they want until it is no longer possible or feasible seems like a good choice. It's not good to artificially impose restrictions. For example, XP is still a lightweight OS to use with MTA in virtual machines, which can aid in debugging sync problems in script development.

@CodyJL

This comment has been minimized.

Copy link

CodyJL commented Oct 17, 2018

Well what I'm saying is, saml1er crossed off DX11 support because he doesn't wanna kill XP support; what I'm saying is that XP should no longer really be supported because all it's doing is holding incredibly important features like the above back which could end up with some groundbreaking developmental advantages.

@ArranTuna

This comment has been minimized.

Copy link
Collaborator

ArranTuna commented Nov 9, 2018

I tested this PR, since it's about ped performance I imagined that FPS would be higher when there are many peds streamed in but I didn't notice any difference in FPS when I created 50 peds between this build and main.

@saml1er

This comment has been minimized.

Copy link
Member Author

saml1er commented Nov 9, 2018

@ArranTuna It's not complete. If you take benchmarks, they'll be inaccurate so there's no point in testing for now.

I tested this PR, since it's about ped performance

It only improves performance of creating and destroying objects, vehicles, and peds. Also, it will improve performance for some specific events that MTA has hooked, you can find them in CClientGame.cpp. It certainly does not improve the FPS as this PR is meant to allow us to implement new features without losing performance which were not possible before, i.e., custom physics.

I didn't notice any difference in FPS when I created 50 peds between this build and main.

You shouldn't because I have not updated the PR. I stopped working on this for a while since I first want to lift streaming memory limits for GTA and custom map objects.

@ArranTuna

This comment has been minimized.

Copy link
Collaborator

ArranTuna commented Nov 9, 2018

Cool, mention me if you need something testing.

@saml1er

This comment has been minimized.

Copy link
Member Author

saml1er commented Nov 16, 2018

So this will in theory allow someone to add custom physics to MTA?

Yes. It's still very challenging to implement custom physics.

Ok so I tested this with another player and we did random stuff which could cause crashes, I was on the server for over an hour but most of my time was spent idle (had to fix deprecated functions which for some reason this custom build warns me about but not the main builds) so yeah I didn't crash but the person I tested with said that when he got kicked from the server for high ping his game crashed and I've asked him to check dump files folder and core.log but both had no info on the crash nothing in event viewer either so I dunno about that, ok so I tested this and can reproduce crash on disconnect. Just do host game, default resources, then disconnect, game crashes. And yeah there is no dump or log about this crash any where that I can see.

I am able to reproduce it. It happens when you host a game, getting kicked or leaving the server will simply make you exit MTA. I'll fix it. Thanks.

@CrosRoad95

This comment has been minimized.

Copy link
Contributor

CrosRoad95 commented Nov 16, 2018

So this will in theory allow someone to add custom physics to MTA?

my future pr will allow to create own custom physics ( give option to get collision data of each model ), and could be introduced faster then Saml1er feature.

@saml1er

This comment has been minimized.

Copy link
Member Author

saml1er commented Nov 16, 2018

@ArranTuna Fixed it. It wasn't actually a crash, it was simply the console causing MTA to exit on disconnect if you're hosting a server.

@ArranTuna

This comment has been minimized.

Copy link
Collaborator

ArranTuna commented Nov 16, 2018

I've done some new testing and is good. I think this is suitable to be merged so that all the players who set their build mode to nightly can get it since that is the whole idea of the nightly option right? To test this new stuff more.

@CodyJL

This comment has been minimized.

Copy link

CodyJL commented Nov 16, 2018

Challenging does not equal impossible yes?

@saml1er

This comment has been minimized.

Copy link
Member Author

saml1er commented Nov 17, 2018

I've done some new testing and is good. I think this is suitable to be merged so that all the players who set their build mode to nightly can get it since that is the whole idea of the nightly option right? To test this new stuff more.

Sure, I'll do a little more testing tonight and merge it, hopefully.

Challenging does not equal impossible yes?

Yeah, it means that it will take more time to implement. I most probably cannot do this on my own as it's a very time-consuming task. I will instead focus on documenting it, so other devs can join and help me finish it.

@CodyJL

This comment has been minimized.

Copy link

CodyJL commented Nov 17, 2018

Well shoot I may start studying bullet physics some time soon then, the physics system in SA is a major performance issue.

@ArranTuna ArranTuna added the tested label Dec 6, 2018

@qaisjp qaisjp removed the tested label Dec 8, 2018

{
DEBUG_TRACE("BOOL CGameSA::InitLocalPlayer( )");

// Added by ChrML - Looks like it isn't safe to call this more than once but mod code might do
static bool bAlreadyInited = false;
/*static bool bAlreadyInited = false;

This comment has been minimized.

Copy link
@qaisjp

qaisjp Dec 12, 2018

Member

Should this bit of code be removed?

This comment has been minimized.

Copy link
@saml1er

saml1er Dec 12, 2018

Author Member

You're right. I'll skim the changes before merging and make possible improvements.

@@ -55,8 +55,6 @@ typedef DWORD HWMENCODE; // WMA encoding handle
#define BASS_WMA_ENCODE_SCRIPT 0x20000 // set script (mid-stream tags) in the WMA encoding
#define BASS_WMA_ENCODE_QUEUE 0x40000 // queue data to feed encoder asynchronously
#define BASS_WMA_ENCODE_SOURCE 0x80000 // use a BASS channel as source
#define BASS_WMA_ENCODE_VOICE 0x100000 // WMA Voice

This comment has been minimized.

Copy link
@qaisjp

qaisjp Dec 12, 2018

Member

Unintended changes to vendor library?

This comment has been minimized.

Copy link
@Dutchman101

Dutchman101 Dec 12, 2018

Collaborator

He just needs to merge master branch, @qaisjp

This comment has been minimized.

Copy link
@qaisjp

qaisjp Dec 12, 2018

Member

the change won't be shown here unless it's a manual one.

This comment has been minimized.

Copy link
@saml1er

saml1er Dec 12, 2018

Author Member

Yes, unintended. However, I think that Dutchman is right because this appeared right after this commit: bf99a83

This comment has been minimized.

Copy link
@qaisjp

qaisjp Dec 12, 2018

Member

git doesn't randomly introduce changes from other branches. It was introduced in this commit d43cd09

This comment has been minimized.

Copy link
@saml1er

saml1er Dec 13, 2018

Author Member

Done

@saml1er saml1er changed the title WIP: Improve pool performance Improve pool performance Dec 13, 2018

@saml1er saml1er merged commit c55ef42 into master Dec 13, 2018

0 of 4 checks passed

continuous-integration/appveyor/branch Waiting for AppVeyor build to complete
Details
continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
continuous-integration/travis-ci/push The Travis CI build is in progress
Details
@qaisjp

This comment has been minimized.

Copy link
Member

qaisjp commented Dec 14, 2018

馃憦

@TemaSM

This comment has been minimized.

Copy link

TemaSM commented Dec 14, 2018

Good job 馃憤

@saml1er

This comment has been minimized.

Copy link
Member Author

saml1er commented Dec 14, 2018

Gracias 馃槃

@botder botder deleted the fast_lookup_byindex branch Jan 12, 2019

@saml1er saml1er referenced this pull request Apr 15, 2019

Open

New Limit Adjuster #869

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can鈥檛 perform that action at this time.